Parsing Python Programs to SDFGs
This document describes DaCe’s core Python language parser, implemented by the
ProgramVisitor supports a restricted subset of Python’s features that can be expressed directly as SDFG elements.
A larger subset of the Python language is supported either through code preprocessing and/or in JIT mode.
Supported Python Versions
ProgramVisitor supports exactly the same Python versions as the Data-Centric framework overall: 3.7-3.10.
To add support for newer Python versions, the developer should amend the
to handle appropriately any changes to the Python AST (Abstract Syntax Tree) module. More details can be found in the
official Python documentation.
Classes and object-oriented programing are only supported in JIT mode.
Python native containers (tuples, lists, sets, and dictionaries) are not supported directly as
Data. Specific instances of them may be indirectly supported through code preprocessing. There is also limited support for specific uses, e.g., as arguments to some methods.
Only the range,
map()iterators are directly supported. Other iterators, e.g., zip may be indirectly supported through code preprocessing.
Recursion is not supported.
Using NumPy arrays with negative indices (at runtime) to wrap around the array is not allowed. Compile-time negative values (such as -1) are supported.
The entry point for parsing a Python program with the
ProgramVisitor is the
The Python call tree when calling or compiling a Data-Centric Python program is as follows:
The ProgramVisitor Class
ProgramVisitor traverses a Data-Centric Python program’s AST and constructs
ProgramVisitor inherits from Python’s ast.NodeVisitor
class and, therefore, follows the visitor design pattern. The developers are encouraged to accustom themselves with this
programming pattern (for example, see Wikipedia and Wikibooks), however, the basic functionality is described below.
An object of the
ProgramVisitor class is responsible for a single
object. While traversing the Python program’s AST, if the need for a
NestedSDFG arises (see Nested ProgramVisitors), a new
ProgramVisitor object will be created to handle the corresponsding Python
Abstract Syntax sub-Tree. The
ProgramVisitor has the following attributes:
filename: The name of the file containing the Data-Centric Python program.
src_line: The line (in the file) where the Data-Centric Python program is called.
src_col: The column (in the line) where the Data-Centric Python program is called.
orig_name: The name of the Data-Centric Python program.
name: The name of the generated
SDFGobject. name and orig_name differ when generating a
globals: The variables defined in the global scope. Typically, these are modules imported and global variables defined in the file containing the Data-Centric Python program.
closure: The closure of the Data-Centric Python program.
nested: True if generating a
simplify: True if the
simplfy()should be called on the generated
scope_arrays: The Data-Centric Data (see
data) defined in the parent
scope_vars: The variables defined in the parent
variables: The variables defined in the current
accesses: A dictionary of the accesses to Data defined in a parent
SDFGscope. Used to avoid generating duplicate
NestedSDFGconnectors for the same Data subsets accessed.
views: A dictionary of Views and the Data subsets viewed. Used to generate Views for Array slices.
nested_closure_arrays: The closure of nested Data-Centric Python programs.
annotated_types: A dictionary from Python variables to Data-Centric datatypes. Used when variables are explicitly type-annotated in the Python code.
Mapsymbols defined in the
SDFG. Useful when deciding when an augmented assignment should be implemented with WCR or not.
sdfg: The generated
last_state: The (current) last
SDFGStateobject created and added to the
inputs: The input connectors of the generated
Memlet-like representation of the corresponding Data subsets read.
outputs: The output connectors of the generated
Memlet-like representation of the corresponding Data subsets written.
current_lineinfo: The current
DebugInfo. Used for debugging.
modules: The modules imported in the file of the top-level Data-Centric Python program. Produced by filtering globals.
loop_idx: The current scope-depth in a nested loop construct.
continue_states: The generated
SDFGStateobjects corresponding to Python continue statements. Useful for generating proper nested loop control-flow.
break_states: The generated
SDFGStateobjects corresponding to Python break statements. Useful for generating proper nested loop control-flow.
symbols: The loop symbols defined in the
SDFGobject. Useful for memlet/state propagation when multiple loops use the same iteration variable but with different ranges.
indirections: A dictionary from Python code indirection expressions to Data-Centric symbols.
The ProgramVisitor and the Visitor Design Pattern
takes as input a Data-Centric Python program’s AST (ast.FunctionDef object).
It then iterates over and visits the statements in the program’s body. The Python call tree when visiting a statement is approximately as follows:
In the above fourth call, Class in visit_Class is a placeholder for the name
of one of the Python AST module class supported by the ProgramVisitor.
For example, if the statement is an object of the ast.Assign
visit_Assign() method will be invoked.
Each object of a Python AST module class (called henceforth AST node) typically
has as attributes other AST nodes, generating tree-structures. Accordingly, the
corresponding ProgramVisitor methods perform some action for the parent AST node
and then recusively call other methods to handle the children AST nodes until
the whole tree has been processed. It should be mentioned that, apart from the
class-specific visitor methods, the following may also appear in the Python call tree:
generic_visit(): A generic visitor method. Usefull to automatically call the required class-specific methods when no special handling is required.
TaskletTransformer: A ProgramVisitor that is specialized to handle the explcit dataflow mode syntax.
Nested ProgramVisitors and NestedSDFGs
ProgramVisitor will trigger the generation of a
NestedSDFG (through a nested
ProgramVisitor) in the following cases:
When parsing the body of a
map(). This will occur even when a
NestedSDFGis not necessary. Simplification of the resulting subgraph is left to
When parsing a call (see ast.Call) to another Data-Centric Python program or an
SDFGobject. It should be noted that calls to, e.g., supported NumPy methods (see
replacements), may also (eventually) trigger the generation of a
NestedSDFG. However, this is mostly occuring through Library Nodes.
When parsing explcit dataflow mode syntax. The whole Abstract Syntax sub-Tree of such statements is passed to a
Below follows a list of all AST class-specific
ProgramVisitor’s methods and a short description of
of which Python language features they support and how:
Parses functions decorated with one of the following decorators:
The Data-Centric Python frontend does not allow definition of Data-Centric Python programs inside another one.
This visitor will catch such cases and raise
Parses for statements using one of the following iterators:
range: Results in a (sequential) for-loop.
parrange(): Results in uni-dimensional
@dace.program def for_loop(A: dace.int32): for i in range(0, 10, 2): A[i] = i
Parses while statements. Example:
@dace.program def while_loop(): i = 10 while i > 0: i -= 3
Parses break statements. In the following example, the for-loop behaves as an if-else statement. This is also evident from the generated dataflow:
@dace.program def for_break_loop(A: dace.int32): for i in range(0, 10, 2): A[i] = i break
Parses continue statements. In the following example, the use
of continue makes the
A[i] = i statement unreachable. This is also evident from the generated dataflow:
@dace.program def for_continue_loop(A: dace.int32): for i in range(0, 10, 2): continue A[i] = i
Parses if statements. Example:
@dace.program def if_stmt(a: dace.int32): if a < 0: return -1 elif a > 0: return 1 else: return 0
Allows parsing of PEP 572 assignment expressions (Warlus operator), e.g.,
n := 5.
However, such expressions are currently treated by the
ProgramVisitor as simple assignments.
In Python, assignment expressions allow assignments within comprehesions. Therefore, whether an assignment expression
will have the Python-equivalent effect in a Data-Centric Python program depends on the
support for those complehensions.
Parses assignment statements. Example:
@dace.program def assign_stmt(): a = 5
Parses annotated assignment statements. The
respects these type annotations and the assigned variables will have the same (DaCe-compatible) datatype as if the code
was executed through the CPython interpreter.
Parses augmented assignments statements. The
will try to infer whether the assigned memory location is read and written by a single thread. In such cases, the
assigned memory location will appear as both input and output in generated subgraph. Otherwise, it will appear only as
output and the corresponding edge will have write-conflict resolution (WCR). Example:
@dace.program def augassign_stmt(): a = 0 for i in range(10): a += 1 for i in dace.map[0:10]: a += 1
Parses function call statements. These statements may call any of the following:
Another Data-Centric Python program: Execution is transferred to a nested
An (already parsed)
SDFGobject: Generates directly a
A supported Python builtin or module (e.g., NumPy) method: Execution is transferred to the corresponding replacement method (see
An unsupported method: Generates a callback to the CPython interpreter.
Parses return statements.
Parses with statements. Supports only explcit dataflow mode syntax.
Parses async with statements. However, these statements are treates as simple with statements.
Parses string constants. DEPRECATED in Python 3.8 and newer versions.
Parses numerical constants. DEPRECATED in Python 3.8 and newer versions.
Parses all constant values.
Parses names, e.g., variable names.
Parses name constants. DEPRECATED in Python 3.8 and newer versions.
Parses attributes. Allows accessing attributes of supported
objects. Typically, these are
Visits each list element and returns a list with the results.
Does not support Python lists as
Visits each tuple element and returns a tuple with the results.
Does not support Python tuples as
Visits each set element and returns a set with the results.
Does not support Python sets as
Visits each dictionary key-value pair and returns a dictionary with the results.
Does not support Python dictionaries as
Generates a string representation of a lambda function.
Parses unary operations.
Parses binary operations.
Parses boolean operations.
Parses subscripts. This visitor all parses the subscript’s slice expressions.
Parses index expressions in subscripts. DEPRECATED.
Parses slice expressions in subscripts. DEPRECATED.