New Backends
A backend in DaCe is a code generator that lowers some subset of the SDFG IR
to source code in a specific language and emits the build artifacts needed to
compile that source. The built-in backends live under
dace.codegen.targets (CPU, CUDA, MPI, SVE, MLIR,
Snitch, …). Adding a new backend means plugging into three points of the
codegen pipeline: the user-facing enumerations, the dispatcher, and the
build/link environment.
This page covers the plumbing. For a worked example, see the
Extending the Code Generator
tutorial, which builds a small custom target end to end. The
dace/codegen/targets/cpu.py and dace/codegen/targets/cuda.py files are
the canonical references for any non-trivial backend.
Extensible Enumerations
Most DaCe enumerations - DeviceType,
ScheduleType, StorageType,
InstrumentationType,
DataInstrumentationType, and
Language, among others - are user-extensible.
They subclass dace.attr_enum.ExtensibleAttributeEnum, which is a
drop-in extension of the standard library’s enum.Enum that adds
two capabilities a backend frequently needs:
New members can be registered at runtime, with either a plain value or a frozen dataclass acting as a template for attributed instances.
Members can carry attributes (via dataclass templates), which is what lets
ScheduleType.GPU_ThreadBlockand friends remain symbolic while richer entries (e.g., a vendor-specific schedule with tunable parameters) carry the parameters on the enum value itself.
Adding plain entries
The simplest case is adding a new constant. Use the
register() classmethod from
your backend’s setup code (typically the package’s __init__.py):
from dace import dtypes
# Auto-assigned value (analogous to `auto()`):
dtypes.DeviceType.register('MyAccelerator')
dtypes.ScheduleType.register('MyAccelerator_Device')
dtypes.ScheduleType.register('MyAccelerator_ThreadBlock')
dtypes.StorageType.register('MyAccelerator_Global')
# Explicit value (pass it as the second argument):
dtypes.StorageType.register('MyAccelerator_Shared', 301)
After registration, the new entries are reachable as
dtypes.ScheduleType.MyAccelerator_Device and serialize cleanly in
saved SDFGs - the SDFG (de)serializer round-trips any
ExtensibleAttributeEnum member through the registry.
Adding attributed entries (templates)
When a backend needs to attach data to an enum member - for example, the parameters of a vendor-specific schedule - register a template. A template is just a frozen dataclass; calling the template produces an instance of the enum that carries the supplied field values:
from dataclasses import dataclass
from dace import dtypes
@dataclass(frozen=True)
class MyAcceleratorKernel:
block_size: int
smem_kb: int
dtypes.ScheduleType.register_template('MyAccelerator_Kernel',
MyAcceleratorKernel)
# Use the template like any other enum member:
sched = dtypes.ScheduleType.MyAccelerator_Kernel(block_size=128,
smem_kb=48)
sched.block_size # -> 128
Two design points worth noting (mirrored in
tests.utils.attrenum_test):
The template compares equal to any of its instances, which makes attributed members ergonomic in
match/caseblocks (case dtypes.ScheduleType.MyAccelerator_Kernel:).Two instances created with the same arguments are interned to the same object (
sched is dtypes.ScheduleType.MyAccelerator_Kernel(block_size=128, smem_kb=48)), which keeps SDFG hashing and identity comparisons predictable.
Templates can also be declared inline at class-definition time when you
own the enum class, by writing @dataclass(frozen=True) classes
inside the enum body; this is the form used in the test suite. From a
backend extension’s perspective, register_template is the equivalent
runtime form.
Optional: undefined_safe_enum
The decorator dace.registry.undefined_safe_enum() is an optional
helper that registers a single extra member named Undefined on an
ExtensibleAttributeEnum. The built-in DaCe enums use it so that
deserialization, partially-specified IR, and forward-compatibility paths
have a well-known sentinel to fall back to instead of raising on an
unknown enum value:
from dace.attr_enum import ExtensibleAttributeEnum
from dace.registry import undefined_safe_enum
from enum import auto
@undefined_safe_enum
class MyEnum(ExtensibleAttributeEnum):
First = auto()
Second = auto()
MyEnum.Undefined # added by the decorator
Using the decorator is purely a convenience - if you do not need an
Undefined sentinel for your own enum, omit it. Custom enums that
extend the built-in ones (via register / register_template)
inherit the sentinel from their parent class and do not need to apply
the decorator themselves.
Implementing a Target Code Generator
A backend is a subclass of
TargetCodeGenerator registered through
@dace.registry.autoregister_params(). The class is instantiated once
per SDFG by the framecode driver and is responsible for:
declaring which schedules, storage types, and node patterns it can handle;
emitting the source files to be compiled (returning them as
CodeObjectinstances fromget_generated_codeobjects());providing the per-node, per-state, per-copy, and per-allocation generators that the dispatcher will call back into.
Inside the constructor, the backend registers itself with the
TargetDispatcher for the events it wants
to handle:
class MyTargetCodeGen(target.TargetCodeGenerator):
def __init__(self, frame, sdfg):
self._frame = frame
self._dispatcher = frame.dispatcher
# Maps with these schedules will go through `generate_node`/_state.
self._dispatcher.register_map_dispatcher(
[dtypes.ScheduleType.MyAccelerator_Device], self)
# Arrays in this storage will go through this target's
# allocation/deallocation hooks.
self._dispatcher.register_array_dispatcher(
dtypes.StorageType.MyAccelerator_Global, self)
# Copies between these storages will be lowered by this target.
self._dispatcher.register_copy_dispatcher(
dtypes.StorageType.CPU_Heap,
dtypes.StorageType.MyAccelerator_Global,
None,
self)
The full list of registration entry points is documented on
TargetDispatcher:
Each register_* call accepts an optional predicate so that a target can
opt in or out based on richer criteria than just an enum value.
The target then implements the corresponding callbacks
(generate_state, generate_node, allocate_array,
deallocate_array, copy_memory, …). Each callback receives the
relevant SDFG, state, node(s), and a pair of CodeIOStream objects
into which to emit code at the local (current scope) and global
(translation-unit-level) positions.
Build environments and linked libraries
A backend almost always needs to add headers, runtime support files, and
link-time dependencies to the build. The recommended way to declare these is
via environments: define a @dace.library.environment
that lists the cmake_packages, headers, cmake_libraries, and
runtime state_fields your generated code requires, and add it to the
environments list of the relevant code objects.
For more invasive integration (e.g., additional source
files generated alongside the SDFG), the CodeObject returned by the
target can declare extra additional_files and target-specific
additional_compiler_flags. The CMake driver in
dace.codegen.compiler consumes those fields when building the shared
library.
Once registered, the backend is selected automatically whenever an SDFG contains nodes/arrays/maps with the schedules, storage types, or node classes the target claimed. No changes to user-facing APIs are required.