.. _sdfg: Stateful Dataflow multiGraphs (SDFG) ==================================== Philosophy ---------- The central tenet of our approach is that understanding and optimizing data movement is the key to portable, high-performance code. In a data-centric programming paradigm, three governing principles guide execution: 1. Data containers must be separate from computations. 2. Data movement must be explicit, both from data containers to computations and to other data containers. 3. Control flow dependencies must be minimized, and only define execution order if no implicit dataflow is given. As opposed to instruction-driven execution that augments analyses with dataflow (i.e., the *control-centric* view), execution in the *data-centric* view is data-driven. This means that concepts such as parallelism are inherent to the representation, and can be scheduled to run efficiently on a wide variety of platforms. Some of the main differences between SDFGs and other representations are: * Scoping is different: instead of functions and modules, in SDFGs the scopes relate to dataflow: a scope can be a graph, a parallel region (see :ref:`sdfg-map`), or anything that has regions of data passed in/out of it. * There are two kinds of scalar values: **scalar** data containers and **symbols** (see :ref:`sdfg-symbol` for more information). * We make a distinction between three types of data movement: **read**, **write**, and **update** (for which we use a write-conflict resolution function). See :ref:`sdfg-memlet` for more details. The Language ------------ In a nutshell, an SDFG is a hierarchical state machine of acyclic dataflow multigraphs. Here is an example graph: .. raw:: html .. note:: Use the left mouse button to navigate and scroll wheel to zoom. Hover over edges or zoom into certain nodes (e.g., tasklets) to see more information. The cyan rectangles are called **states** and together they form a state machine, executing the code from the starting state and following the blue edge that matches the conditions. In each state, an acyclic multigraph controls execution through dataflow. There are four elements in the above states: * **Access nodes** (ovals) that give access to data containers * **Memlets** (edges/dotted arrows) that represent units of data movement * **Tasklets** (octagons) that contain computations (zoom in on the above example to see the code) * **Map scopes** (trapezoids) representing parametric parallel sections (replicating all the nodes inside them N x N times) Dataflow is captured in the memlets, which connect access nodes and tasklets and pass through scopes (such as maps). As you can see in the example, a memlet can split after going into a map, as the parallel region forms a scope. The state machine shown in the example is a for-loop (``for _ in range(5)``). The init state is where execution starts, the guard state controls the loop, and at the end the result is copied to the special ``__return`` data container, which designates the return value of the function. The state machine is analogous to a control flow graph, where states represent basic blocks. Multiple such basic blocks, such as with the described loop, can be put together to form a **control flow region**. This allows them to be represented with a single graph node in the SDFG's state machine, which is useful for optimization and analysis. The SDFG itself can be thought of as one big control flow region. This means that control flow regions are directed graphs, where nodes are states or other control flow regions, and edges are state transitions. In addition to the elements seen in the example above, there are other kinds of elements in an SDFG, which are detailed below. .. _sdfg-lang: Elements ~~~~~~~~ .. figure:: images/elements.svg :figwidth: 40% :align: right :alt: Elements of the SDFG IR. Elements of the SDFG IR. **Access Node**: A node that points to a named data container. An edge going out of one would read from the container and an edge going into one would write or update the memory. There can be more than one instance of the same container, even in the same state (see above example). An access node will take the form of the data container it is pointing to. For example, if the data container is a stream, the line around it would be dashed. For more information, see :ref:`descriptors`. **Tasklet**: Tasklets are the computational nodes of SDFGs. They can perform arbitrary computations, with one restriction: *they must not access any external memory apart from what is given to them via edges*. In the data-centric paradigm, Tasklets are viewed as black-boxes, which means that only limited analysis is performed on them, and transformations rarely use their contents. Tasklets can be written in any language, as long as the code generator supports it. The recommended language is always Python (even if the source language is different), which allows the limited analysis (e.g., operation count). Other supported languages are C++, MLIR, SystemVerilog, and others (see :class:`~dace.dtypes.Language`). **Nested SDFG**: Nodes that contain an entire SDFG in a state. When invoked, the nested SDFG will be executed in that context, independently from other instances if parallel. The semantics are similar to a Tasklet: connectors specify input and output parameters, and there is acyclic dataflow going in and out of the node. However, as opposed to a Tasklet, a nested SDFG is completely analyzable. Such nodes are useful when control flow is necessary in parallel regions. For example, when there is a loop inside a map, or when two separate components need to each run its own state machine. Several transformations (e.g., :class:`~dace.transformation.interstate.sdfg_nesting.InlineSDFG`, :class:`~dace.transformation.dataflow.map_fission.MapFission`) work directly with nested SDFGs, and the :ref:`simplify` tries to remove/inline them as much as possible. To use the inputs and outputs, the node's connectors have data containers with matching names in the internal SDFG. To pass symbols into the SDFG, the :class:`~dace.sdfg.nodes.NestedSDFG.symbol_mapping` is a dictionary mapping from internal symbol names to symbolic expressions based on external values. Symbols cannot be transferred out of the nested SDFG (as this breaks the assumptions behind symbol values, see :ref:`sdfg-symbol` for more information). .. figure:: images/scope.svg :name: scopefig :figwidth: 30% :width: 100% :align: right :alt: Scope nodes. SDFG scope nodes. **Map**: A scope that denotes parallelism. Maps consist of at least two nodes: entry (trapezoid) and exit (inverted trapezoid) nodes. Those nodes are annotated with parameters and symbolic ranges, which specify the parallel iteration space. The two nodes can wrap an arbitrary subgraph by dominating and post-dominating the contents, which means that every edge that originates from outside the scope must go through one of the entry/exit nodes. In the SDFG language, the subgraph within the map scope is replicated a parametric number of times and can be scheduled to different computational units (e.g., CPU cores). For more information, see :ref:`sdfg-map`. **Consume**: The streaming counterpart to the Map scope, denoting parametric parallelism via multiple consumers in a producer/consumer relationship. There is always a Stream access node connected directly to the consume scope, which will be processed by the number of processing elements (PEs) specified on the nodes. Additionally, an optional quiescence condition can be used to specify early stopping for consuming. By default, consumption will stop when the input stream is empty for the first time. Note that a stream can also be an output of a consume scope, so you can keep producing more tasks for the same scope as you are consuming (useful for unrolling recursion). **Library Node**: A high-level node that represents a specific function (e.g., matrix multiplication). During compilation and optimization, Library Nodes are *expanded* to different implementations, for example to call a fast library (e.g., CUBLAS, MKL), or to a native SDFG representation of it. For more information, see :ref:`libnodes`. **Memlet**: Data movement unit. The memlet contains information about which data is being moved, what are the constraints of the data being moved (the *subset*), how much data is moved (*volume*), and more. If the movement specifies an update, for example when summing a value to existing memory, there is no need to read the original value with an additional memlet. Instead, a *write-conflict resolution* (WCR) function can be specified: the function takes the original value and the new value, and specifies how the update is performed. In the summation example, the WCR is ``lambda old, new: old + new``. For more information, see :ref:`sdfg-memlet`. **State**: Contains any of the above dataflow elements. A state's execution is entirely driven by dataflow, and at the end of each state there is an implicit synchronization point, so it will not finish executing until all the last nodes have been reached (this assumption can be removed in extreme cases, see :class:`~dace.sdfg.state.SDFGState.nosync`). **Control Flow Region**: Forms a directed graph of states and other control flow regions, where edges are state transitions. This allows representing complex control flow in a single graph node, which is useful for analysis and optimization. The SDFG itself is a control flow region, which means that control flow regions are recursive / hierarchical. Similar to the SDFG, each control flow region has a unique starting state, which is the entry point to the region and is executed first. **State Transition**: Transitions, internally referred to as *inter-state edges*, specify how execution proceeds after the end of a State. Inter-state edges optionally contain a symbolic *condition* that is checked at the end of the preceding state. If any of the conditions are true, execution will continue to the destination of this edge (the behavior if more than one edge is true is undefined). If no condition is met (or no more outgoing edges exist), the SDFG's execution is complete. State transitions also specify a dictionary of *assignments*, each of which can set a symbol to a new value (in fact, this is **the only way in which a symbol can change its value**). Both conditions and assignments can depend on values from data containers, but can only set symbols. The condition/assignment properties allow SDFGs to represent control flow constructs, such as for-loops and branches, in a concise manner. .. _descriptors: Data Containers and Access Nodes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. figure:: images/transient.svg :figwidth: 25% :width: 100% :align: right :alt: Transients and globals. Transients and globals. For every access node in an SDFG, there is a matching named **data container**. Data containers are objects that contain accessible data (not necessarily randomly-accessible, however), which can originate from within the SDFG or externally. We call memory that is managed internally in the SDFG *transient*. All data containers, whether transient or global, are registered in ``sdfg.arrays`` along with their descriptors, which descibe their properties, such as shape and layout. Transience is useful for several reasons. First, DaCe can fully analyze those containers and accesses to them, including knowing that they never alias in memory addresses. Second, since they are managed by the SDFG, they could be mutated or removed completely by transformations. On the right-hand side, the figure shows a code and its corresponding SDFG. As ``C`` is generated inside the program, its memory is transient, and a subsequent pass will remove its allocation. Data container types in DaCe are user-extensible, and all extend the :class:`~dace.data.Data` class. The data container types built into DaCe are: * :class:`~dace.data.Array`: Random-access multidimensional arrays with a flexible allocation scheme. See :class:`~dace.data.Array` for how it is allocated and how to customize this behavior. * :class:`~dace.data.Scalar`: Memory allocated for a single value. Can be seen as a "0-dimensional array". * :class:`~dace.data.Stream`: A single or multidimensional array of First-In-First-Out (FIFO) queues. A memlet pointing to a stream would push one or more values to it (depending on the source data volume), whereas a memlet from it would pop elements. For example, a memlet pointing to ``S[5, 2]`` would push to the (5, 2)-th queue in the given stream array. See :class:`~dace.data.Stream` for more properties that define their structure. * :class:`~dace.data.View`: A reinterpretation of an array or sub-array (for example, a slice, or a reshaped array). Must be directly connected to the container it is viewing in every access node. * :class:`~dace.data.Reference`: A pointer to containers of the same description (shape, data type, etc.), which may be set to another container dynamically. **Warning**: inhibits data-centric analyses for optimization. For more information on Views and References, see :ref:`below `. Apart from transience, shape, and data type (``dtype``), there are two important properties in each data descriptor that pertain to how it will be mapped to hardware (and to the generated code): ``storage`` and ``lifetime``. **Storage location** refers to where the container will be allocated --- examples include :class:`~dace.dtypes.StorageType.CPU_Heap` for allocation using ``new[]`` and :class:`~dace.dtypes.StorageType.GPU_Global` for VRAM on the GPU (``{cuda,hip}Malloc``). The full built-in list can be found in the enumeration definition :class:`~dace.dtypes.StorageType`. The enumeration is user-extensible, so adding new entries is easy. **Allocation lifetime** refers to the allocation/deallocation scope of a data container. By default, :class:`~dace.dtypes.AllocationLifetime.Scope` is used, which specifies that the access nodes dictate the allocation lifetime --- the innermost common scope in which the access nodes with the container's name are used create the allocation lifetime. This means that if an access node only exists inside one map scope, the code generator would allocate an array inside it, and deallocate on scope end. However, if another access node with the same name is used somewhere else in the SDFG State, the allocation scope will become the whole state (or first/last executed state if in different states). There are other explicit options, such as :class:`~dace.dtypes.AllocationLifetime.SDFG`, which fix the lifetime, or even lifetime that *outlives a single SDFG execution*: :class:`~dace.dtypes.AllocationLifetime.Persistent` (which can only be used on arrays with sizes that can be determined at call-time) triggers allocation in the initialization function upon loading the SDFG library. Multiple invocations will not re-allocate memory. Lastly, constants from ``sdfg.constants`` can also be used with access nodes. This automatically happens, for example, when using compile-time constant arrays: .. code-block:: python @dace def cst(): return np.array([1., 2., 1.]) sdfg = cst.to_sdfg() print(sdfg.constants) # Prints: {'__tmp0': array([1., 2., 1.])} print(sdfg.node(0).nodes()) # Prints: [AccessNode (__tmp0), AccessNode (__return)] .. _sdfg-symbol: Symbols ~~~~~~~~ Symbols and symbolic expressions are a core part of DaCe. They allow the framework to define arrays with unknown sizes, while still validating them, inferring output shapes, and optimizing their use. They are also extensively used in memlets to analyze memory access patterns, and in maps they define new symbols for use inside the scope. Lastly, in state transitions DaCe uses symbolic analysis to generate structured control flow from an arbitrary state machine (e.g., finding out if a transition is a negation of another, to create ``if/else``). Symbolic expressions are powered by `SymPy `_, but extended by DaCe (:class:`~dace.symbolic.symbol`) to include types and other utilities. Symbols can be used almost anywhere in DaCe --- any object property that is a :class:`~dace.properties.SymbolicProperty` accepts them, and any :class:`~dace.subsets.Subset` is parametric. You can find such properties in data descriptors, memlets, maps, inter-state edges, and others. You can also use them (**read-only**) in Tasklet code directly, as custom properties of your library nodes or transformations, and more. A particular reason that makes symbols useful is the fact they stay constant throughout their defined scope. A symbol defined in a scope (e.g., map parameter) cannot change at all, and symbols that are defined outside an SDFG state cannot be modified inside a state, only in assignments of state transitions. The above read-only property differentiates between a :class:`~dace.symbolic.symbol` and a :class:`~dace.data.Scalar`: Scalars have an assigned storage location (see :ref:`above `) and can be written to at any given point. This means that Scalars cannot be used in symbolic expressions, as their value may change, or not be accessible altogether. In contrast, symbols are always accessible on each device (the code generator ensures this). .. raw:: html

Assigning a Scalar value to a symbol.ΒΆ

In general, using a symbol is always preferable if: (a) its final value is not needed outside the SDFG; (b) its value is necessary to e.g., specify a memlet index; and (c) it is not written to by a computation. The last condition can be worked around if the Scalar is on the host memory using a state transition (see figure on the right, the assignment takes the value of ``scal`` and assigns it to ``sym``, which can be used in subsequent states). During :ref:`simplification `, the :class:`~dace.transformation.passes.scalar_to_symbol.ScalarToSymbolPromotion` pass tries to convert Scalars to symbols, if they fulfill all the constraints of a symbol. Symbols that are defined outside a state can also be given to an SDFG as parameters. This is used when data containers have symbolic sizes. We say that a symbol that does not have a defined value is a *free symbol*. The free symbols of an SDFG have to be added to the symbol store (``sdfg.symbols``) using :func:`~dace.sdfg.sdfg.SDFG.add_symbol`. DaCe also provides a special kind of symbol called :class:`~dace.symbolic.UndefinedSymbol` that represents values which are undefined and deferred to runtime. This allows symbolic analysis to continue even when some data container dimensions or other symbolic values cannot be determined at analysis time. Similar to NaN behavior in floating-point arithmetic, any operation involving an :class:`~dace.symbolic.UndefinedSymbol` results in another :class:`~dace.symbolic.UndefinedSymbol`. Comparisons with undefined symbols yield indeterminate results. During code generation, an informative exception is raised if an :class:`~dace.symbolic.UndefinedSymbol` is encountered, providing clear error messages that point to the undefined symbol. The Python frontend provides syntactic sugar for creating undefined symbols by using the string ``"?"`` in array dimensions: .. code-block:: python import dace import numpy as np @dace.program def example_with_undefined(A: dace.float64["?"], B: dace.float64[20]): # Only access the first 10 elements of A, so the undefined dimension isn't used B[:10] = A[:10] # This compiles successfully since the undefined dimension is not accessed a = np.random.rand(100) # Actual size doesn't matter b = np.random.rand(20) example_with_undefined(a, b) .. _connectors: Connectors ~~~~~~~~~~ .. figure:: images/connectors.svg :figwidth: 40% :width: 100% :align: right :alt: Connector types Connector types. As SDFG states are acyclic multigraphs (where two nodes can have more than one edge between them), every edge needs a port on the source/destination nodes to connect with. In SDFGs, we use *connectors* for this purpose. There are two types of connectors: **view** (colored in cyan) and **passthrough** (colored in transparent turquoise). The former is used to specify an endpoint on nodes, upon which the connector name can be used within that node (e.g., tasklet, nested SDFG, map entry for dynamic map ranges). The latter passes through a scope node (such as a map) and allows DaCe to track the path of memlets through that scope. An example of both is shown on the right. A view connector does not need to define a data container. This is because connectors are references that take on the shape of the memlet connected to it. However, connectors can have types of their own. By default, the type of a connector is ``None``, which means its type is inferred automatically by DaCe in :func:`~dace.sdfg.infer_types.infer_connector_types`. If an type is defined, it acts as a "cast" of the data it is referring to. This is used, among other places, in SIMD vectorization. For example, an access :pycode:`A[4*i : 4*i + 4]` connected to a connector of type :pycode:`dace.vector(dace.float64, 4)` will reinterpret the data as a 4-element vector. View connectors attached to tasklets are collapsed to the underlying array view, similarly to how NumPy behaves. That means that if a memlet pointing to a tasklet is :pycode:`A[i, j, k, l]` and the connector is called :pycode:`a`, :pycode:`a` would be accessed as a scalar. If the memlet has non-scalar ranges (whether they end up being size 1 or larger), e.g., :pycode:`A[i, j:j+k, 0, 0:L]`, the connector in the tasklet must be accessed as a two-dimensional array. In the example, :pycode:`a[m, n]` in the tasklet would correspond to accessing :pycode:`A[i, j+m, 0, n]`. Passthrough connectors are identified only by name: the incoming connector must start with ``IN_`` and outgoing connector must start with ``OUT_``. Passthrough connectors with matching suffixes (e.g., ``IN_arr`` and ``OUT_arr``) are considered part of the same *memlet path* (highlighted in orange in the above figure, see :ref:`below ` for more details). Connectors cannot be dangling (without any connecting edge), and view connectors must only have one connected edge. Other cases will fail validation, with dangling connectors marked in red upon display. .. _sdfg-memlet: Memlets ~~~~~~~ Memlets represent data movement in SDFGs. They can connect access nodes with other access nodes (creating a read *and* a write operation), access nodes with :ref:`view connectors ` (creating a read *or* a write operation, depending on the direction), or directly connect a tasklet to a tasklet or a view connector (creating both a read and a write). Several fields describe the data being moved: * ``data``: The data container being accessed. * ``subset``: A :class:`~dace.subsets.Subset` object that represents the part of the container potentially being moved. * ``volume``: The number of elements moved (as a symbolic expression). The ``dynamic`` boolean property complements this --- if set to True, we say the number of elements moved is *up to* the value of ``volume``. * If ``volume`` is set to ``-1`` and ``dynamic`` is True, the memlet is defined as unbounded, which means there is no way to analyze how much (and when) data will move. * ``wcr`` (default None): An optional lambda function that specifies the Write-Conflict Resolution function. If not None, every movement into the destination container will instead become an update (see below). The first argument of the lambda function is the old value, whereas the second is the incoming value. * ``other_subset``: Typically, only ``subset`` is necessary for memlets that connect access nodes with view connectors. For other cases (for example, access node to access node), the other subset can be used to offset the sub-region of the container *not* named in ``data``. * For example, the copy :pycode:`B[1:21, 0:3] = A[i:i+20, j-1:j+2]` can be represented by :pycode:`dace.Memlet(data='A', subset='i:i+20, j-1:j+2', other_subset='1:21, 0:3')`. *For performance reasons, always prefer constructing range subsets from* :class:`~dace.subsets.Range` *over a string.* * The alias properties ``src_subset`` and ``dst_subset`` specify the source and destination subsets, regardless of the value of ``data``. There are more properties you can set, see :class:`~dace.memlet.Memlet` for a full list. Memlet subsets and volumes are used for analyzing (or estimating, if dynamic) data movement patterns and costs. A memlet's ``subset`` does not necessarily mean that all values in that subset would be read at runtime. It rather acts as a *constraint* on the potential accessed values. This can be used to ensure certain memory is accessible after some optimizations Together with ``volume``, we can define important memory access patterns, such as indirect access: .. code-block:: python # ... for i in dace.map[0:N]: mileage += distances[destinations[i]] # ... In the above, the memlets inside the map are :pycode:`dace.Memlet(data='destinations', subset='i', volume=1)` and :pycode:`dace.Memlet(data='distances', subset='0:20', volume=1)`. Notice that in the latter, we know *one* element would be read in the range ``0:20``, but we do not know which one it is going to be. .. figure:: images/memlet-scopes.svg :name: memtree :figwidth: 40% :width: 100% :align: right :alt: Memlet paths and trees. Memlet paths and trees. **Memlet paths**: When using scopes, such as maps, we can have multiple memlets connected to the same connector, and through multiple levels of scopes. In order to trace and distinguish between them, we use *memlet paths* (green highlighted edges in the figure to the right). A memlet path is a sequence of memlets that connect between two view connectors. We can obtain a memlet path by calling the :func:`~dace.sdfg.state.StateGraphView.memlet_path` on an SDFG state with the edge. The memlet path is actually part of a **memlet tree** (highlighted in orange), since in each scope a memlet can split into multiple memlets. To obtain the tree, use :func:`~dace.sdfg.state.StateGraphView.memlet_tree`. Its *root* is the highest-scope memlet (bottom or top, depending on the direction of the path), and its *leaves* are the lowest-scope memlets (connected to the tasklet in the figure). Changing a data descriptor often requires changing the data names on every memlet in the memlet tree. In code generation, memlets emit reference/copy code at the memlet tree leaf level (i.e., in the innermost scope), because of the parametric replication aspect (in the "expanded" graph, every memlet path is a single edge connecting to a copy of the internal node). **Empty memlets** are memlets that carry no data. They are used to lock a node into a scope, without actually moving data into or out of it. For example, when zeroing an array, the tasklet ``a = 0`` has no inputs, but has to be inside a map. It is thus connected to the map entry node with an empty memlet. You can check if a memlet is empty using the :func:`~dace.memlet.Memlet.is_empty` method. In **write-conflict resolution (WCR) memlets**, the WCR function itself may have properties (such as commutativity). The WCR expression is symbolically analyzed (using SymPy) to determine such properties, and even finds common reduction functions (such as summation or product), replacing it with a built-in operator (e.g., ``a + b`` becomes :class:`dace.dtypes.ReductionType.Sum`). If it can detect a :class:`~dace.dtypes.ReductionType`, it knows more properties about the function and can use fast version in libraries (for example, ``MPI_SUM`` in MPI). Note that Streams cannot use WCR. WCR updates can be implemented in a platform-dependent way, for example using atomic operations, one-sided communication, or different kinds of accumulators on FPGAs. It is thus important to differentiate updates from a simple read+write, since it can yield faster, more analyzable code. Like copies, WCR memlets are applied on the innermost scope of the memlet tree. This requires to exercise caution when using such memlets, as they can lead to excessive atomic operations or critical sections. There are, however, transformations that can be applied on the SDFG to change the schedule and, for example, add a buffer to accumulate locally and save on expensive update operations (see :class:`~dace.transformation.dataflow.stream_transient.AccumulateTransient`). .. _sdfg-map: Parametric Parallelism ~~~~~~~~~~~~~~~~~~~~~~ DaCe supports parametric parallelism, which is a form of parallelism that is defined by symbolic expressions. To represent parallel sections, we use graph scopes, which are defined by *Entry* and *Exit* nodes (as shown in :numref:`scopefig`). The parametric ranges themselves are represented by symbolic integer sets (using the same :class:`~dace.subsets.Subset` class used for memlets). There are several scopes provided in DaCe, the most common one being :class:`~dace.sdfg.nodes.Map` and the second is :class:`~dace.sdfg.nodes.Consume`. **Maps** represent a simple form of replication, where the scope is replicated for each value in the range. For example, ``dace.map[0:N, 1:M-N+1]`` is a parametrically-parallel scope with two dimensions, the first ranging from ``0`` to ``N-1``, and the second ranging from ``1`` to ``M-N`` (inclusive). This means that the scope subgraph will be replicated for this number of times. The ``params`` property of the map specifies the names that parameterize the map range, and must match the given ``range`` property. Maps can be *scheduled* to have their contents executed in different locations. For example, a map can be scheduled to run on a multi-core CPU (with :class:`~dace.dtypes.ScheduleType.CPU_Multicore`), which will generate a ``for`` loop for each instance of the subgraph, using OpenMP to control parallelism. Maps can also be scheduled to run on a GPU (with :class:`~dace.dtypes.ScheduleType.GPU_Device`), or other devices, see :class:`~dace.dtypes.ScheduleType` for all built-in types. The default schedule is :class:`~dace.dtypes.ScheduleType.Default`, which means that the scheduler will decide the schedule for the map based on its context. For example, if the map is already situated inside another multi-core map, the map will be scheduled to a single core using :class:`~dace.dtypes.ScheduleType.Sequential`. Maps have other properties that further control their behavior. For example, their exact schedule (unrolling, loop collapsing) can be controlled with certain attributes, such as ``unroll`` and ``gpu_block_size``. **Consume** scopes are similar to maps, but are used to consume data from a stream. Their processing elements (``num_pes``) is a symbolic expression that governs the degree of parallelism during the scope. ``pe_index`` acts as a "thread index" variable. The quiescence ``condition`` property can be used to control when to end the scope's execution (by default, if ``condition`` is not given, the scope will run until the stream is empty). The ``chunksize`` property specifies the granularity of the stream consumption (how many elements to consume at once). .. figure:: images/scope-tree.svg :figwidth: 30% :width: 100% :align: right :alt: Scope trees. SDFG scopes in :numref:`memtree`. **Navigating in scopes**: Like memlets, scopes in an SDFG state form a tree. The root of the tree is the top-level scope, i.e., the one that is not contained in any other scope. You can navigate the scope tree as a whole using the :func:`~dace.sdfg.state.StateGraphView.scope_tree` method. You can also navigate the scope tree of a specific node by using the :func:`~dace.sdfg.state.StateGraphView.scope_children` method to get the child nodes of the specified scope, or :func:`~dace.sdfg.state.StateGraphView.scope_dict` to get the parent-pointing dictionary of each node to its parent. This is visualized in the figure on the right. To easily get the scope of a node, you can use the :func:`~dace.sdfg.state.StateGraphView.scope_subgraph` method, which will return the subgraph of only the nodes contained in the requested scope. To jump to the entry node of the current scope, use the :func:`~dace.sdfg.state.StateGraphView.entry_node` method. Similarly, to get the exit node of the current entry node, use the :func:`~dace.sdfg.state.StateGraphView.exit_node` method. **Dynamic Map Ranges**: Such ranges can use memlets to define the map ranges directly from data containers, while still retaining the dataflow of a single state. As they are fed into a view connector on the map entry node, their value (described by the connector name) can be used in the symbolic expressions of the map range, and anywhere inside the map scope as a symbol (same as the iteration variables). Only scalar connectors are allowed. In the following example, we use dynamic map ranges to compute a sparse matrix-vector multiplication, where the vector is dense. Every output row has a defined range (standard, symbolic map), whereas the corresponding rows of the input sparse matrix have a dynamic length (dynamic-range.) .. raw:: html
.. code-block:: python @dace.program def spmv(A_row: dace.uint32[H + 1], A_col: dace.uint32[nnz], A_val: dace.float32[nnz], x: dace.float32[W], b: dace.float32[H]): for i in dace.map[0:H]: for j in dace.map[A_row[i]:A_row[i + 1]]: with dace.tasklet: # Explicit dataflow syntax aval << A_val[j] xval << x[A_col[j]] out = aval * xval out >> b(1, lambda a,b: a+b)[i] # WCR output spmv.to_sdfg().view() .. _viewref-lang: Views and References ~~~~~~~~~~~~~~~~~~~~ Similarly to a NumPy view, a *View* data container is created whenever an array is sliced, reshaped, reinterpreted, or by other NumPy functions (e.g., with a ``keepdims`` argument). A view never creates a copy, but simply reinterprets the properties (shape, strides, type, etc.) of the viewed container. In C it would be equivalent to setting a pointer to another pointer with an offset (that is also how the code generator implements it). Views are useful for performance, reducing memory footprint, and to modify sub-arrays in meaningful ways, for example solving a system of equations on matrices made from a long 1-dimensional vector. .. figure:: images/views.svg :figwidth: 50% :align: right :alt: Chained views example. Chained views and the corresponding generated code. A column of ``A`` is turned into a matrix, which is viewed again at a sub-column. Each data descriptor must set the appropriate strides, as the viewed memlet is only used to offset a pointer. A view is always connected to another data container from one side, and on the other it is used just like a normal access node. In case of ambiguity, the ``views`` connector points to the data being viewed. You can even chain multiple views together using this connector (see figure on the right). Another place where views can be found is in Nested SDFG connectors: any data container in a nested SDFG is in fact a *View* of the memlet going in or out of it. As opposed to Views, *References* can be set in one place (using the ``set`` connector), and then be used as normal access nodes later anywhere else. They can be set to different containers (even more than once), as long as the descriptors match (as with views). References are useful for patterns such as multiple-buffering (``a, b = b, a``) and polymorphism. .. warning:: Since references detach the access node from the data container, it inhibits the strength of data-centric analysis for transformations and optimization. Frontends generate them only when absolutely necessary. Therefore, **use references sparingly**, or not at all if possible. An example use of references can be created with the following code: .. raw:: html
.. code-block:: python i = dace.symbol('i') @dace def refs(A, B, out): if i < 5: ref = A else: ref = B out[:] = ref refs.to_sdfg(a, b, out).view() .. _libnodes: Library Nodes ~~~~~~~~~~~~~ For certain functions and methods, there are high-performance or platform-specific versions that can be beneficial. For example, the ``numpy.dot`` function can be implemented using a matrix multiplication, which in turn can be implemented by a fast Basic Linear Algebra Subprogram (BLAS) libraries, such as MKL or CUBLAS. DaCe provides the capability to use such libraries by using *Library Nodes*, which can be used for different purposes: * **Specialization**: A library node can be used to specialize a generic node to a specific implementation. * **Coarse-grained transformations**: A library node can be used to implement a transformation that is not possible with the generic node, i.e., relying on the semantics of the node. For example, fusing a matrix multiplication with a transposition operation. Specializing library nodes to a specific implementation is called **expansion**. Expansions are user-extensible and can be added to a DaCe library or to a specific SDFG. Built-in expansions and library nodes are defined in the ``dace.libraries`` module, and are automatically loaded when the library is imported. Library-specific transformations also exist in the standard set of transformations, for example :class:`~dace.transformation.dataflow.matrix_product_transpose.MatrixProductTranspose`. During compilation or optimization, a library node will be expanded using the ``expand`` method of the chosen implementation. If no implementation is chosen, the ``default_implementation`` field of the library node class will be chosen. One can override default implementations for a library node type, or for an entire library. This can be used, for example, to set :ref:`BLAS ` to default to a certain library: .. code-block:: python from dace.libraries import blas @dace def mult(a, b): return a @ b mult(a, b) # will use the default implementation in the config file ("pure") blas.default_implementation = 'MKL' @dace def mult2(a, b): return a @ b mult2(a, b) # will use Intel MKL Internally, an expansion is a subclass of :class:`~dace.transformation.transformation.ExpandTransformation`. It is responsible for creating a new SDFG that implements the library node, and for connecting the inputs and outputs of the library node to the new SDFG. An example of such an expansion is Einstein summation specialization (`see full file `_): .. code-block:: python from dace import SDFG, SDFGState, library, nodes, properties from dace import transformation as xf # Define the library node itself @library.node class Einsum(nodes.LibraryNode): # Set the default expansion of the node to 'specialize' (registered below) implementations = { ... } default_implementation = 'specialize' # Configurable properties of the einsum node go here ... ... # Define the expansion, which specializes the einsum by lowering it to either # a BLAS operation or a direct contraction @library.register_expansion(Einsum, 'specialize') class SpecializeEinsum(xf.ExpandTransformation): # Define environments necessary for this expansion (optional, can be an empty list) environments = [] # The following method returns the SDFG that results from expanding the library node. # Upon expansion, DaCe will insert the returned SDFG into the graph as a nested SDFG # node (which can be inlined). @staticmethod def expansion(node: Einsum, parent_state: SDFGState, parent_sdfg: SDFG) -> SDFG: # Make an SDFG for the expansion sdfg = SDFG('einsum') ... return sdfg .. _memprop: Memlet Propagation ------------------ In order to perform transformations, DaCe needs to know the shape and type of memlets at all scopes. For example, if only a single column of a matrix is accessed in a map, this could be used to optimize memory transfers. Therefore, DaCe performs *memlet propagation* to infer the shape and type of memlets at all scopes. It is a process that starts from the memlets on the innermost scopes, and propagates outwards through scope entry/exit nodes. Formally, the memlet that goes into a scope contains the union of the subsets of all the internal memlets, and its volume is the sum of the volumes of all the internal memlets. This means that we must project the space of the parametric scope onto the internal, parametric memlet. For example, the memlet ``A[i, j:j+10:2]`` inside a map ranged ``[i=1:N, j=0:25]`` would be propagated to ``A[1:N, 0:35:2]``, and the memlet ``B[2*i + 1]`` would be propagated to ``B[3:2*N:2]`` within the same map. Memlet propagation also works through nested SDFGs, by propagating the potential value ranges each symbol may take. Such propagation is the equivalent of scope propagation for ``for`` loops and general symbolic values. The following figure shows an example of this: .. figure:: images/memprop.png :align: center :width: 70% Memlet propagation through a nested SDFG with a loop. DaCe uses the SymPy symbolic engine to perform this projection, as well as to union multiple internal memlets for propagation. The projection is performed in the :mod:`~dace.sdfg.propagation` module, using pattern matching on symbolic expressions. The :class:`~dace.sdfg.propagation.MemletPattern` class is used to define such patterns, and the function :func:`~dace.sdfg.propagation.propagate_subset` performs the propagation itself. The process is triggered automatically by the Python frontend. If you want to trigger it manually on an entire SDFG, call :func:`~dace.sdfg.propagation.propagate_memlets_sdfg`. Note, however, that this may be a slow process depending on the size of the SDFG. To propagate a single memlet, use :func:`~dace.sdfg.propagation.propagate_memlet`, and for a single scope use :func:`~dace.sdfg.propagation.propagate_memlets_scope`. If you only want to trigger the part that propagates symbol values across the SDFG state machine, call :func:`~dace.sdfg.propagation.propagate_states`. .. _sdfg-api: SDFG Builder API ---------------- When writing frontends for DaCe and when transforming SDFGs, it is necessary to interact with the SDFG directly. The SDFG builder API provides a set of functions that allow the user to create SDFGs programmatically. The API is similar to graph manipulation APIs, such as `NetworkX `_ (in fact, the SDFG is internally represented as an ordered NetworkX graph). The entire API is organized around methods of the :class:`~dace.sdfg.sdfg.SDFG` and :class:`~dace.sdfg.state.SDFGState` classes. Additionally, helper functions in :mod:`dace.sdfg.utils` and :mod:`dace.transformation.helpers` can be used to perform common operations, such as traversal, querying, analysis, graph nesting, and many others. **Adding/removing elements**: The two classes contain ``add_*`` methods for adding nodes and edges, and ``remove_*`` methods for removing them. Examples include :meth:`~dace.sdfg.state.SDFGState.add_tasklet` and :meth:`~dace.sdfg.sdfg.SDFG.add_edge`. Other helper methods can compose these operations, such as :meth:`~dace.sdfg.state.SDFGState.add_mapped_tasklet`, which adds a tasklet inside a map; and :meth:`~dace.sdfg.state.SDFGState.add_memlet_path` which adds a path of memlets between an arbitrary number of scopes, :ref:`propagating ` them along the way. **Connectors and edges**: The methods ``add_{in,out}_connector`` on subclasses of :class:`~dace.sdfg.nodes.Node` add appropriate connectors (view or passthrough) to the node. The :meth:`~dace.sdfg.state.SDFGState.add_edge` method of :class:`~dace.sdfg.state.SDFGState` receives two nodes and two connector names, and creates an edge between them. :meth:`~dace.sdfg.graph.OrderedMultiDiConnectorGraph.add_nedge` (no-connector edge) acts as a shorthand for edges with ``None`` connectors on either side. **Enumeration**: The API can enumerate the specific nodes and edges of a graph via the ``nodes()`` and ``edges()`` methods. The interface also contains methods for querying the graph, such as :meth:`~dace.sdfg.sdfg.SDFG.all_nodes_recursive`, which is useful to obtain all nodes, including states and nodes of nested SDFGs (similarly, :meth:`~dace.sdfg.sdfg.SDFG.all_edges_recursive` and :meth:`~dace.sdfg.sdfg.SDFG.all_sdfgs_recursive` query edges and SDFGs, respectively). Navigating within states can be performed using the :ref:`scope methods `. **Subgraphs**: The API allows the user to obtain subgraphs of the SDFG, such as the :meth:`~dace.sdfg.state.StateGraphView.scope_subgraph` method, which returns a subgraph of the state that contains all nodes within a given scope. In general, the :class:`~dace.sdfg.graph.SubgraphView` class can be used to obtain a view of a subgraph. It filters out nodes and edges that are not part of the subgraph, and can be used to perform operations on the subgraph, such as traversing it. **Data descriptors**: Methods such as :meth:`~dace.sdfg.sdfg.SDFG.add_array` and :meth:`~dace.sdfg.sdfg.SDFG.add_stream` can be used to add data descriptors to the SDFG. Any subclass of the :class:`~dace.data.Data` base class can be added to the SDFG using the :meth:`~dace.sdfg.sdfg.SDFG.add_datadesc` method. **Traversal**: Since nodes and edges are stored in arbitrary order, the API provides methods for traversing the graph by topological order. The method :func:`~dace.sdfg.utils.dfs_topological_sort` returns a list of nodes in a state, and :func:`~dace.sdfg.analysis.cfg.blockorder_topological_sort` traverses the state machine in approximate order of execution (i.e., preserving order and entering if/for scopes before continuing). What to Avoid ------------- While SDFGs are Turing complete, they are not meant to be used to represent any program concisely. Concepts such as parametric-depth recursion are control-centric (and not very portable). In order to represent them, a stack would need to be simulated in the representation, which will be slow. The frontends avoid parametric-depth recursion by creating a callback to the original runtime (e.g., the Python interpreter) instead of parsing it into an SDFG. Another example is dynamic memory. Dynamically-allocated arrays are supported, so long as their sizes can be expressed as symbolic expressions. However, this would create another state in the state machine (see the example in :ref:`sdfg-symbol`), which might hamper dataflow analysis. In general, it is better to use constant or symbolically-sized memory allocation. Lastly, complex pointer arithmetic and referencing different types of arrays with the same Reference are not supported. Frontends will try to mitigate this by creating multiple Reference data descriptors, but references in general inhibit provenance tracking of arrays, and many optimizations would not apply. In many of the above cases, the frontends will emit a warning to the user, and unsupported behavior will be encapsulated as callbacks. It is important to pay attention to these warnings, as they might contribute to performance issues later on. .. _format: ``.sdfg`` File Format --------------------- An SDFG file is a JSON file that contains all the properties of the graph's elements. See :ref:`properties` for more information about how those are saved. You can save an SDFG to a file in the SDFG API with the :func:`~dace.sdfg.sdfg.SDFG.save` method. Loading an SDFG from a file uses the :func:`~dace.sdfg.sdfg.SDFG.from_file` static method. For example, in the following save/load roundtrip: .. code-block:: python @dace.program def example(a: dace.float64[20]): return a + 1 sdfg = example.to_sdfg() # Create an SDFG out of the DaCe program sdfg.save('myfile.sdfg') # Save new_sdfg = dace.SDFG.from_file('myfile.sdfg') # Reload assert sdfg.hash_sdfg() == new_sdfg.hash_sdfg() # OK, SDFGs are the same The ``compress`` argument can be used to save a smaller (``gzip`` compressed) file. It can keep the same extension, but it is customary to use ``.sdfg.gz`` or ``.sdfgz`` to let others know it is compressed. It is recommended to use this option for large SDFGs, as it not only saves space, but also speeds up loading and editing of the SDFG in visualization tools and the VSCode extension.