Working with Symbolic Expressions

DaCe relies on SymPy to reason about array shapes, memlet ranges, schedules, and any other quantity that may depend on a runtime parameter. Almost every numeric attribute on the SDFG IR - shape entries, ranges, parameter bounds, transient sizes - is either a Python integer or a SymPy expression. This page is a tour of the helpers in dace.symbolic that the rest of DaCe (and most extensions) use when they need to manipulate those expressions.

Symbols

DaCe symbols are SymPy symbols with an optional fixed type and attached integer assumption (based on the type):

import dace
N = dace.symbol('N', positive=True)
M = dace.symbol('M', dtype=dace.int64)

The class dace.symbolic.symbol extends sympy.Symbol so that any DaCe symbol can be used as-is in SymPy expressions, while still carrying the metadata DaCe needs (a name, a data type, and a set of SymPy assumptions). The convenience type alias SymbolicType stands for Union[sympy.Basic, SymExpr] and is the type hint to use whenever a function accepts a symbol or a symbolic expression.

The richer container SymExpr carries both an exact expression and an approximate expression. Most code only needs the exact one; over-approximations come into play when the IR has to guarantee an upper or lower bound (see below).

Indeterminate comparisons

Comparing symbolic expressions returns one of three answers:

  • (N > 0) == True - the inequality is implied by the assumptions on the symbols.

  • (N > 0) == False - the inequality is provably false.

  • (N > 0) is an unevaluated SymPy expression - SymPy could neither prove nor refute the inequality. This is a frequent source of subtle bugs in transformations: never use such an expression in a Python if. Use simplify() (or supply more assumptions) before branching on it.

Integer arithmetic caveats

SymPy’s default rational arithmetic does not match the C/C++ semantics that the code generators eventually emit. DaCe ships two SymPy functions, int_floor and int_ceil, that correspond to a // b and ceil(a / b) for positive integers respectively. Use them whenever you need to keep the symbolic result in int arithmetic; the simplifier and the code generator both know about them.

Analysis

  • issymbolic() checks whether a value is a SymPy expression that depends on at least one symbol (treating literal Integer/Float as non-symbolic).

  • free_symbols_and_functions() returns the names of every free symbol and every undefined function appearing in the expression. This is the right helper to use when computing the symbol set that must be present in an SDFG before an expression can be evaluated.

  • swalk() is a small visitor that yields every sub-expression in pre-order traversal, optionally descending into function arguments. Use it to look for specific patterns or to gather all occurrences of a kind of node.

  • For sub-expression matching, SymPy’s expr.match(pattern) and expr.find(pattern) are usually sufficient; the SymPy documentation has examples.

Conversion

  • pystr_to_symbolic() parses a Python-style string ("N + 2*M") into a SymPy expression while honoring DaCe’s conventions (e.g., int_floor for //).

  • symstr() renders a SymPy expression back to a Python-style string for generated code and Python-facing utilities.

  • dace.codegen.common.sym2cpp() emits a C/C++-friendly string from the same expression. Code generators should use sym2cpp instead of str(expr) so that integer division, min/max, and the DaCe helper functions produce valid C++.

Serialized symbolic form

JSON serialization uses serialize_symbolic() and deserialize_symbolic() rather than pystr_to_symbolic. The serialized strings are type-accurate and bijective:

  • symbols are emitted as $name;

  • symbols with non-default type/assumptions use symbol($name, dtype=dace.uint64, nonnegative=True);

  • constants carry an explicit suffix such as 2i16 or 8.0f64;

  • SymExpr values serialize as SymExpr(expr, overapprox).

Mutation and simplification

  • simplify() is the recommended simplifier throughout the DaCe codebase. Unlike sympy.simplify it preserves integer semantics and runs efficiently on the kinds of expressions that show up in memlet ranges.

  • safe_replace() performs a substitution that is safe under aliasing - replacing a -> b and b -> c simultaneously produces a -> b, b -> c rather than a -> c. Use it whenever you build a substitution dictionary from a mapping that could overlap.

  • overapproximate() returns a syntactic over-approximation of an expression (for instance, replacing a data-dependent Min with one of its arguments). Together with the approx field of SymExpr this is what allows memlet propagation to compute conservative ranges when the exact range is data-dependent.

Symbolic types vs. scalars

When extending DaCe, a recurring decision is whether a quantity should be modeled as a symbol or as a scalar transient. The rule of thumb is:

  • Use a symbol for quantities that are constant over the lifetime of the SDFG (typically loop bounds and array shapes provided by the caller), or a state (e.g., indices used in memlets). Symbols participate in the symbolic propagation system.

  • Use a scalar transient for quantities that may change within states (counters, accumulators, intermediate results). They live in arrays and are written by tasklets like any other data.

When in doubt, prefer a symbol if the value is set once or consumed in ranges or schedules; prefer a scalar otherwise. See the FAQ entry on this question for a longer discussion.