Parsing Python Programs to SDFGs

Prerequisites

Scope

This document describes DaCe’s core Python language parser, implemented by the ProgramVisitor class. The ProgramVisitor supports a restricted subset of Python’s features that can be expressed directly as SDFG elements. A larger subset of the Python language is supported either through code preprocessing and/or in JIT mode.

Supported Python Versions

The ProgramVisitor supports exactly the same Python versions as the Data-Centric framework overall: 3.7-3.10. To add support for newer Python versions, the developer should amend the ProgramVisitor to handle appropriately any changes to the Python AST (Abstract Syntax Tree) module. More details can be found in the official Python documentation.

Main Limitations

  • Classes and object-oriented programing are only supported in JIT mode.

  • Python native containers (tuples, lists, sets, and dictionaries) are not supported directly as Data. Specific instances of them may be indirectly supported through code preprocessing. There is also limited support for specific uses, e.g., as arguments to some methods.

  • Only the range, parrange(), and map() iterators are directly supported. Other iterators, e.g., zip may be indirectly supported through code preprocessing.

  • Recursion is not supported.

  • Using NumPy arrays with negative indices (at runtime) to wrap around the array is not allowed. Compile-time negative values (such as -1) are supported.

Parsing Flow

The entry point for parsing a Python program with the ProgramVisitor is the parse_dace_program() method. The Python call tree when calling or compiling a Data-Centric Python program is as follows:

  1. dace.frontend.python.parser.DaceProgram

  2. dace.frontend.python.parser.DaceProgram.__call__(), or dace.frontend.python.parser.DaceProgram.compile(), or dace.frontend.python.parser.DaceProgram.to_sdfg()

  3. dace.frontend.python.parser.DaceProgram._parse()

  4. dace.frontend.python.parser.DaceProgram._generated_pdp()

  5. dace.frontend.python.newast.parse_dace_program()

  6. dace.frontend.python.newast.ProgramVisitor.parse_program()

The ProgramVisitor Class

The ProgramVisitor traverses a Data-Centric Python program’s AST and constructs the corresponding SDFG. The ProgramVisitor inherits from Python’s ast.NodeVisitor class and, therefore, follows the visitor design pattern. The developers are encouraged to accustom themselves with this programming pattern (for example, see Wikipedia and Wikibooks), however, the basic functionality is described below. An object of the ProgramVisitor class is responsible for a single SDFG object. While traversing the Python program’s AST, if the need for a NestedSDFG arises (see Nested ProgramVisitors), a new (nested) ProgramVisitor object will be created to handle the corresponsding Python Abstract Syntax sub-Tree. The ProgramVisitor has the following attributes:

  • filename: The name of the file containing the Data-Centric Python program.

  • src_line: The line (in the file) where the Data-Centric Python program is called.

  • src_col: The column (in the line) where the Data-Centric Python program is called.

  • orig_name: The name of the Data-Centric Python program.

  • name: The name of the generated SDFG object. name and orig_name differ when generating a NestedSDFG.

  • globals: The variables defined in the global scope. Typically, these are modules imported and global variables defined in the file containing the Data-Centric Python program.

  • closure: The closure of the Data-Centric Python program.

  • nested: True if generating a NestedSDFG.

  • simplify: True if the simplfy() should be called on the generated SDFG object.

  • scope_arrays: The Data-Centric Data (see data) defined in the parent SDFG scope.

  • scope_vars: The variables defined in the parent ProgramVisitor scope.

  • numbers: DEPRECATED

  • variables: The variables defined in the current ProgramVisitor scope.

  • accesses: A dictionary of the accesses to Data defined in a parent SDFG scope. Used to avoid generating duplicate NestedSDFG connectors for the same Data subsets accessed.

  • views: A dictionary of Views and the Data subsets viewed. Used to generate Views for Array slices.

  • nested_closure_arrays: The closure of nested Data-Centric Python programs.

  • annotated_types: A dictionary from Python variables to Data-Centric datatypes. Used when variables are explicitly type-annotated in the Python code.

  • map_symbols: The Map symbols defined in the SDFG. Useful when deciding when an augmented assignment should be implemented with WCR or not.

  • sdfg: The generated SDFG object.

  • last_state: The (current) last SDFGState object created and added to the SDFG.

  • inputs: The input connectors of the generated NestedSDFG and a Memlet-like representation of the corresponding Data subsets read.

  • outputs: The output connectors of the generated NestedSDFG and a Memlet-like representation of the corresponding Data subsets written.

  • current_lineinfo: The current DebugInfo. Used for debugging.

  • modules: The modules imported in the file of the top-level Data-Centric Python program. Produced by filtering globals.

  • loop_idx: The current scope-depth in a nested loop construct.

  • continue_states: The generated SDFGState objects corresponding to Python continue statements. Useful for generating proper nested loop control-flow.

  • break_states: The generated SDFGState objects corresponding to Python break statements. Useful for generating proper nested loop control-flow.

  • symbols: The loop symbols defined in the SDFG object. Useful for memlet/state propagation when multiple loops use the same iteration variable but with different ranges.

  • indirections: A dictionary from Python code indirection expressions to Data-Centric symbols.

The ProgramVisitor and the Visitor Design Pattern

The ProgramVisitor’s parse_program() method takes as input a Data-Centric Python program’s AST (ast.FunctionDef object). It then iterates over and visits the statements in the program’s body. The Python call tree when visiting a statement is approximately as follows:

  1. dace.frontend.python.newast.ProgramVisitor.parse_program()

  2. dace.frontend.python.astutils.ExtNodeVisitor.visit_TopLevel()

  3. dace.frontend.python.newast.ProgramVisitor.visit()

  4. dace.frontend.python.newast.ProgramVisitor.visit_Class()

In the above fourth call, Class in visit_Class is a placeholder for the name of one of the Python AST module class supported by the ProgramVisitor. For example, if the statement is an object of the ast.Assign class, the visit_Assign() method will be invoked. Each object of a Python AST module class (called henceforth AST node) typically has as attributes other AST nodes, generating tree-structures. Accordingly, the corresponding ProgramVisitor methods perform some action for the parent AST node and then recusively call other methods to handle the children AST nodes until the whole tree has been processed. It should be mentioned that, apart from the class-specific visitor methods, the following may also appear in the Python call tree:

Nested ProgramVisitors and NestedSDFGs

The ProgramVisitor will trigger the generation of a NestedSDFG (through a nested ProgramVisitor) in the following cases:

  • When parsing the body of a map(). This will occur even when a NestedSDFG is not necessary. Simplification of the resulting subgraph is left to InlineSDFG.

  • When parsing a call (see ast.Call) to another Data-Centric Python program or an SDFG object. It should be noted that calls to, e.g., supported NumPy methods (see replacements), may also (eventually) trigger the generation of a NestedSDFG. However, this is mostly occuring through Library Nodes.

  • When parsing explcit dataflow mode syntax. The whole Abstract Syntax sub-Tree of such statements is passed to a TaskletTransformer.

Visitor Methods

Below follows a list of all AST class-specific ProgramVisitor’s methods and a short description of of which Python language features they support and how:

visit_FunctionDef()

Parses functions decorated with one of the following decorators:

The Data-Centric Python frontend does not allow definition of Data-Centric Python programs inside another one. This visitor will catch such cases and raise DaceSyntaxError.

visit_For()

Parses for statements using one of the following iterators:

  • range: Results in a (sequential) for-loop.

  • parrange(): Results in uni-dimensional Map.

  • map(): Results in a multi-dimensional Map.

Example:

@dace.program
def for_loop(A: dace.int32[10]):
    for i in range(0, 10, 2):
        A[i] = i
Generated SDFG for-loop for the above Data-Centric Python program

visit_While()

Parses while statements. Example:

@dace.program
def while_loop():
    i = 10
    while i > 0:
        i -= 3
Generated SDFG while-loop for the above Data-Centric Python program

visit_Break()

Parses break statements. In the following example, the for-loop behaves as an if-else statement. This is also evident from the generated dataflow:

@dace.program
def for_break_loop(A: dace.int32[10]):
    for i in range(0, 10, 2):
        A[i] = i
        break
Generated SDFG for-loop with a break statement for the above Data-Centric Python program

visit_Continue()

Parses continue statements. In the following example, the use of continue makes the A[i] = i statement unreachable. This is also evident from the generated dataflow:

@dace.program
def for_continue_loop(A: dace.int32[10]):
    for i in range(0, 10, 2):
        continue
        A[i] = i
Generated SDFG for-loop with a continue statement for the above Data-Centric Python program

visit_If()

Parses if statements. Example:

@dace.program
def if_stmt(a: dace.int32):
    if a < 0:
        return -1
    elif a > 0:
        return 1
    else:
        return 0
Generated SDFG if statement for the above Data-Centric Python program

visit_NamedExpr()

Allows parsing of PEP 572 assignment expressions (Warlus operator), e.g., n := 5. However, such expressions are currently treated by the ProgramVisitor as simple assignments. In Python, assignment expressions allow assignments within comprehesions. Therefore, whether an assignment expression will have the Python-equivalent effect in a Data-Centric Python program depends on the ProgramVisitor’s support for those complehensions.

visit_Assign()

Parses assignment statements. Example:

@dace.program
def assign_stmt():
    a = 5
Generated SDFG assignment statement for the above Data-Centric Python program

visit_AnnAssign()

Parses annotated assignment statements. The ProgramVisitor respects these type annotations and the assigned variables will have the same (DaCe-compatible) datatype as if the code was executed through the CPython interpreter.

visit_AugAssign()

Parses augmented assignments statements. The ProgramVisitor will try to infer whether the assigned memory location is read and written by a single thread. In such cases, the assigned memory location will appear as both input and output in generated subgraph. Otherwise, it will appear only as output and the corresponding edge will have write-conflict resolution (WCR). Example:

@dace.program
def augassign_stmt():
    a = 0
    for i in range(10):
        a += 1
    for i in dace.map[0:10]:
        a += 1
Generated SDFG augmeneted assignment statements for the above Data-Centric Python program

visit_Call()

Parses function call statements. These statements may call any of the following:

  • Another Data-Centric Python program: Execution is transferred to a nested ProgramVisitor.

  • An (already parsed) SDFG object: Generates directly a NestedSDFG.

  • A supported Python builtin or module (e.g., NumPy) method: Execution is transferred to the corresponding replacement method (see replacements).

  • An unsupported method: Generates a callback to the CPython interpreter.

visit_Return()

Parses return statements.

visit_With()

Parses with statements. Supports only explcit dataflow mode syntax.

visit_AsyncWith()

Parses async with statements. However, these statements are treates as simple with statements.

visit_Str()

Parses string constants. DEPRECATED in Python 3.8 and newer versions.

visit_Num()

Parses numerical constants. DEPRECATED in Python 3.8 and newer versions.

visit_Constant()

Parses all constant values.

visit_Name()

Parses names, e.g., variable names.

visit_NameConstant()

Parses name constants. DEPRECATED in Python 3.8 and newer versions.

visit_Attribute()

Parses attributes. Allows accessing attributes of supported objects. Typically, these are Data objects.

visit_List()

Visits each list element and returns a list with the results. Does not support Python lists as Data.

visit_Tuple()

Visits each tuple element and returns a tuple with the results. Does not support Python tuples as Data.

visit_Set()

Visits each set element and returns a set with the results. Does not support Python sets as Data.

visit_Dict()

Visits each dictionary key-value pair and returns a dictionary with the results. Does not support Python dictionaries as Data.

visit_Lambda()

Generates a string representation of a lambda function.

visit_UnaryOp()

Parses unary operations.

visit_BinOp()

Parses binary operations.

visit_BoolOp()

Parses boolean operations.

visit_Compare()

Parses comparisons.

visit_Subscript()

Parses subscripts. This visitor all parses the subscript’s slice expressions.

visit_Index()

Parses index expressions in subscripts. DEPRECATED.

visit_ExtSlice()

Parses slice expressions in subscripts. DEPRECATED.