Parsing Python Programs to SDFGs
Prerequisites
Scope
This document describes DaCe’s core Python language parser, implemented by the ProgramVisitor
class.
The ProgramVisitor
supports a restricted subset of Python’s features that can be expressed directly as SDFG elements.
A larger subset of the Python language is supported either through code preprocessing and/or in JIT mode.
Supported Python Versions
The ProgramVisitor
supports exactly the same Python versions as the Data-Centric framework overall: 3.7-3.10.
To add support for newer Python versions, the developer should amend the ProgramVisitor
to handle appropriately any changes to the Python AST (Abstract Syntax Tree) module. More details can be found in the
official Python documentation.
Main Limitations
Classes and object-oriented programing are only supported in JIT mode.
Python native containers (tuples, lists, sets, and dictionaries) are not supported directly as
Data
. Specific instances of them may be indirectly supported through code preprocessing. There is also limited support for specific uses, e.g., as arguments to some methods.Only the range,
parrange()
, andmap()
iterators are directly supported. Other iterators, e.g., zip may be indirectly supported through code preprocessing.Recursion is not supported.
Using NumPy arrays with negative indices (at runtime) to wrap around the array is not allowed. Compile-time negative values (such as -1) are supported.
Parsing Flow
The entry point for parsing a Python program with the ProgramVisitor
is the parse_dace_program()
method.
The Python call tree when calling or compiling a Data-Centric Python program is as follows:
dace.frontend.python.parser.DaceProgram.__call__()
, ordace.frontend.python.parser.DaceProgram.compile()
, ordace.frontend.python.parser.DaceProgram.to_sdfg()
dace.frontend.python.parser.DaceProgram._parse()
dace.frontend.python.parser.DaceProgram._generated_pdp()
The ProgramVisitor Class
The ProgramVisitor
traverses a Data-Centric Python program’s AST and constructs
the corresponding SDFG
. The ProgramVisitor
inherits from Python’s ast.NodeVisitor
class and, therefore, follows the visitor design pattern. The developers are encouraged to accustom themselves with this
programming pattern (for example, see Wikipedia and Wikibooks), however, the basic functionality is described below.
An object of the ProgramVisitor
class is responsible for a single SDFG
object. While traversing the Python program’s AST, if the need for a NestedSDFG
arises (see Nested ProgramVisitors), a new
(nested) ProgramVisitor
object will be created to handle the corresponsding Python
Abstract Syntax sub-Tree. The ProgramVisitor
has the following attributes:
filename
: The name of the file containing the Data-Centric Python program.src_line
: The line (in the file) where the Data-Centric Python program is called.src_col
: The column (in the line) where the Data-Centric Python program is called.orig_name
: The name of the Data-Centric Python program.name
: The name of the generatedSDFG
object. name and orig_name differ when generating aNestedSDFG
.globals
: The variables defined in the global scope. Typically, these are modules imported and global variables defined in the file containing the Data-Centric Python program.closure
: The closure of the Data-Centric Python program.nested
: True if generating aNestedSDFG
.simplify
: True if thesimplfy()
should be called on the generatedSDFG
object.scope_arrays
: The Data-Centric Data (seedata
) defined in the parentSDFG
scope.scope_vars
: The variables defined in the parentProgramVisitor
scope.numbers
: DEPRECATEDvariables
: The variables defined in the currentProgramVisitor
scope.accesses
: A dictionary of the accesses to Data defined in a parentSDFG
scope. Used to avoid generating duplicateNestedSDFG
connectors for the same Data subsets accessed.views
: A dictionary of Views and the Data subsets viewed. Used to generate Views for Array slices.nested_closure_arrays
: The closure of nested Data-Centric Python programs.annotated_types
: A dictionary from Python variables to Data-Centric datatypes. Used when variables are explicitly type-annotated in the Python code.map_symbols
: TheMap
symbols defined in theSDFG
. Useful when deciding when an augmented assignment should be implemented with WCR or not.sdfg
: The generatedSDFG
object.last_block
: The (current) lastControlFlowBlock
object created and added to the currentControlFlowRegion
.current_state
: The (current) lastSDFGState
object created and added to the currentControlFlowRegion
, similar to last_block, but only tracking states.sdfg
: The currentSDFG
being worked on.cfg_target
: The currentControlFlowRegion
being worked on (may be the currentSDFG
or a sub-region, such as aLoopRegion
).last_cfg_target
: The previousControlFlowRegion
that blocks were being added to.inputs
: The input connectors of the generatedNestedSDFG
and aMemlet
-like representation of the corresponding Data subsets read.outputs
: The output connectors of the generatedNestedSDFG
and aMemlet
-like representation of the corresponding Data subsets written.current_lineinfo
: The currentDebugInfo
. Used for debugging.modules
: The modules imported in the file of the top-level Data-Centric Python program. Produced by filtering globals.symbols
: The loop symbols defined in theSDFG
object. Useful for memlet/state propagation when multiple loops use the same iteration variable but with different ranges.indirections
: A dictionary from Python code indirection expressions to Data-Centric symbols.
The ProgramVisitor and the Visitor Design Pattern
The ProgramVisitor
’s parse_program()
method
takes as input a Data-Centric Python program’s AST (ast.FunctionDef object).
It then iterates over and visits the statements in the program’s body. The Python call tree when visiting a statement is approximately as follows:
dace.frontend.python.astutils.ExtNodeVisitor.visit_TopLevel()
dace.frontend.python.newast.ProgramVisitor.visit_Class()
In the above fourth call, Class in visit_Class is a placeholder for the name
of one of the Python AST module class supported by the ProgramVisitor.
For example, if the statement is an object of the ast.Assign
class, the visit_Assign()
method will be invoked.
Each object of a Python AST module class (called henceforth AST node) typically
has as attributes other AST nodes, generating tree-structures. Accordingly, the
corresponding ProgramVisitor methods perform some action for the parent AST node
and then recusively call other methods to handle the children AST nodes until
the whole tree has been processed. It should be mentioned that, apart from the
class-specific visitor methods, the following may also appear in the Python call tree:
generic_visit()
: A generic visitor method. Usefull to automatically call the required class-specific methods when no special handling is required.TaskletTransformer
: A ProgramVisitor that is specialized to handle the explcit dataflow mode syntax.
Nested ProgramVisitors and NestedSDFGs
The ProgramVisitor
will trigger the generation of a NestedSDFG
(through a nested ProgramVisitor
) in the following cases:
When parsing the body of a
map()
. This will occur even when aNestedSDFG
is not necessary. Simplification of the resulting subgraph is left toInlineSDFG
.When parsing a call (see ast.Call) to another Data-Centric Python program or an
SDFG
object. It should be noted that calls to, e.g., supported NumPy methods (seereplacements
), may also (eventually) trigger the generation of aNestedSDFG
. However, this is mostly occuring through Library Nodes.When parsing explcit dataflow mode syntax. The whole Abstract Syntax sub-Tree of such statements is passed to a
TaskletTransformer
.
Visitor Methods
Below follows a list of all AST class-specific ProgramVisitor
’s methods and a short description of
of which Python language features they support and how:
visit_FunctionDef()
Parses functions decorated with one of the following decorators:
The Data-Centric Python frontend does not allow definition of Data-Centric Python programs inside another one.
This visitor will catch such cases and raise DaceSyntaxError
.
visit_For()
Parses for statements using one of the following iterators:
Example:
@dace.program
def for_loop(A: dace.int32[10]):
for i in range(0, 10, 2):
A[i] = i
If the DaceProgram
’s
use_experimental_cfg_blocks
attribute is set to true, this will utilize
:class:`~dace.sdfg.state.LoopRegion`s instead of the explicit state machine depicted above.
visit_While()
Parses while statements. Example:
@dace.program
def while_loop():
i = 10
while i > 0:
i -= 3
If the DaceProgram
’s
use_experimental_cfg_blocks
attribute is set to true, this will utilize
:class:`~dace.sdfg.state.LoopRegion`s instead of the explicit state machine depicted above.
visit_Break()
Parses break statements. In the following example, the for-loop behaves as an if-else statement. This is also evident from the generated dataflow:
@dace.program
def for_break_loop(A: dace.int32[10]):
for i in range(0, 10, 2):
A[i] = i
break
If the DaceProgram
’s
use_experimental_cfg_blocks
attribute is set to true, loops are
represented with BreakState
.
visit_Continue()
Parses continue statements. In the following example, the use
of continue makes the A[i] = i
statement unreachable. This is also evident from the generated dataflow:
@dace.program
def for_continue_loop(A: dace.int32[10]):
for i in range(0, 10, 2):
continue
A[i] = i
If the DaceProgram
’s
use_experimental_cfg_blocks
attribute is set to true, loops are
represented with ContinueState
.
visit_If()
Parses if statements. Example:
@dace.program
def if_stmt(a: dace.int32):
if a < 0:
return -1
elif a > 0:
return 1
else:
return 0
visit_NamedExpr()
Allows parsing of PEP 572 assignment expressions (Warlus operator), e.g., n := 5
.
However, such expressions are currently treated by the ProgramVisitor
as simple assignments.
In Python, assignment expressions allow assignments within comprehesions. Therefore, whether an assignment expression
will have the Python-equivalent effect in a Data-Centric Python program depends on the ProgramVisitor
’s
support for those complehensions.
visit_Assign()
Parses assignment statements. Example:
@dace.program
def assign_stmt():
a = 5
visit_AnnAssign()
Parses annotated assignment statements. The ProgramVisitor
respects these type annotations and the assigned variables will have the same (DaCe-compatible) datatype as if the code
was executed through the CPython interpreter.
visit_AugAssign()
Parses augmented assignments statements. The ProgramVisitor
will try to infer whether the assigned memory location is read and written by a single thread. In such cases, the
assigned memory location will appear as both input and output in generated subgraph. Otherwise, it will appear only as
output and the corresponding edge will have write-conflict resolution (WCR). Example:
@dace.program
def augassign_stmt():
a = 0
for i in range(10):
a += 1
for i in dace.map[0:10]:
a += 1
visit_Call()
Parses function call statements. These statements may call any of the following:
Another Data-Centric Python program: Execution is transferred to a nested
ProgramVisitor
.An (already parsed)
SDFG
object: Generates directly aNestedSDFG
.A supported Python builtin or module (e.g., NumPy) method: Execution is transferred to the corresponding replacement method (see
replacements
).An unsupported method: Generates a callback to the CPython interpreter.
visit_Return()
Parses return statements.
visit_With()
Parses with statements. Supports only explcit dataflow mode syntax.
visit_AsyncWith()
Parses async with statements. However, these statements are treates as simple with statements.
visit_Str()
Parses string constants. DEPRECATED in Python 3.8 and newer versions.
visit_Num()
Parses numerical constants. DEPRECATED in Python 3.8 and newer versions.
visit_Constant()
Parses all constant values.
visit_Name()
Parses names, e.g., variable names.
visit_NameConstant()
Parses name constants. DEPRECATED in Python 3.8 and newer versions.
visit_Attribute()
Parses attributes. Allows accessing attributes of supported
objects. Typically, these are Data
objects.
visit_List()
Visits each list element and returns a list with the results.
Does not support Python lists as Data
.
visit_Tuple()
Visits each tuple element and returns a tuple with the results.
Does not support Python tuples as Data
.
visit_Set()
Visits each set element and returns a set with the results.
Does not support Python sets as Data
.
visit_Dict()
Visits each dictionary key-value pair and returns a dictionary with the results.
Does not support Python dictionaries as Data
.
visit_Lambda()
Generates a string representation of a lambda function.
visit_UnaryOp()
Parses unary operations.
visit_BinOp()
Parses binary operations.
visit_BoolOp()
Parses boolean operations.
visit_Compare()
Parses comparisons.
visit_Subscript()
Parses subscripts. This visitor all parses the subscript’s slice expressions.
visit_Index()
Parses index expressions in subscripts. DEPRECATED.
visit_ExtSlice()
Parses slice expressions in subscripts. DEPRECATED.