dace.transformation.auto package¶

Submodules¶

dace.transformation.auto.auto_optimize module¶

Automatic optimization routines for SDFGs.

dace.transformation.auto.auto_optimize.auto_optimize(sdfg: dace.sdfg.sdfg.SDFG, device: dace.dtypes.DeviceType, validate: bool = True, validate_all: bool = False, symbols: Dict[str, int] = None) → dace.sdfg.sdfg.SDFG¶

Runs a basic sequence of transformations to optimize a given SDFG to decent performance. In particular, performs the following:

Simplify

Auto-parallelization (loop-to-map)

Greedy application of SubgraphFusion

Tiled write-conflict resolution (MapTiling -> AccumulateTransient)

Tiled stream accumulation (MapTiling -> AccumulateTransient)

Collapse all maps to parallelize across all dimensions

Set all library nodes to expand to fast expansion, which calls the fastest library on the target device

Parameters:	sdfg – The SDFG to optimize. device – the device to optimize for. validate – If True, validates the SDFG after all transformations have been applied. validate_all – If True, validates the SDFG after every step. symbols – Optional dict that maps symbols (str/symbolic) to int/float
Returns:	The optimized SDFG.
Note:	Operates in-place on the given SDFG.
Note:	This function is still experimental and may harm correctness in certain cases. Please report an issue if it does.

dace.transformation.auto.auto_optimize.find_fast_library(device: dace.dtypes.DeviceType) → List[str]¶

dace.transformation.auto.auto_optimize.greedy_fuse(graph_or_subgraph: Union[dace.sdfg.sdfg.SDFG, dace.sdfg.state.SDFGState, dace.sdfg.graph.SubgraphView], validate_all: bool, device: dace.dtypes.DeviceType = <DeviceType.CPU: 1>, recursive: bool = True, stencil: bool = False, stencil_tile=None, permutations_only: bool = True, expand_reductions: bool = False) → None¶: Greedily fuses maps of an SDFG or graph, operating in-place. :param graph_or_subgraph: SDFG, SDFGState or Subgraph :param validate_all: Validate SDFG or graph at each fusion step :param device: Device type to specialize for :param recursive: Fuse recursively within (fused and unfused) scopes :param stencil: Perform stencil fusion instead of regular fusion :param stencil_tile: StencilTiling Tile size, default if None :param permutations_only: Disallow splitting of maps during MultiExpansion stage :param expand_reductions: Expand all reduce nodes before fusion

dace.transformation.auto.auto_optimize.make_transients_persistent(sdfg: dace.sdfg.sdfg.SDFG, device: dace.dtypes.DeviceType, toplevel_only: bool = True) → Dict[int, Set[str]]¶

Helper function to change several storage and scheduling properties - Makes non-view array lifetimes persistent, with some

restrictions depending on the device

Reset nonatomic WCR edges on GPU

The only arrays that are made persistent by default are ones that do not exist inside a scope (and thus may be allocated multiple times), and whose symbols are always given as parameters to the SDFG (so that they can be allocated in a persistent manner).

Parameters:	sdfg – SDFG device – Device type toplevel_only – If True, only converts access nodes that do not appear in any scope.
Returns:	A dictionary mapping SDFG IDs to a set of transient arrays that were made persistent.

dace.transformation.auto.auto_optimize.move_small_arrays_to_stack(sdfg: dace.sdfg.sdfg.SDFG) → None¶: Set all Default storage types that are constant sized and less than the auto-tile size to the stack (as StorageType.Register). :param sdfg: The SDFG to operate on. :note: Operates in-place on the SDFG.

dace.transformation.auto.auto_optimize.set_fast_implementations(sdfg: dace.sdfg.sdfg.SDFG, device: dace.dtypes.DeviceType, blocklist: List[str] = None)¶

Set fast library node implementations for the given device

Parameters:	sdfg – The SDFG to optimize. device – the device to optimize for. blocklist – list of disallowed implementations.
Note:	Operates in-place on the given SDFG.

dace.transformation.auto.auto_optimize.tile_wcrs(graph_or_subgraph: Union[dace.sdfg.sdfg.SDFG, dace.sdfg.state.SDFGState, dace.sdfg.graph.SubgraphView], validate_all: bool, prefer_partial_parallelism: bool = None) → None¶

Tiles parallel write-conflict resolution maps in an SDFG, state, or subgraphs thereof. Reduces the number of atomic operations by tiling and introducing transient arrays to accumulate atomics on. :param graph_or_subgraph: The SDFG/state/subgraph to optimize within. :param validate_all: If True, runs SDFG validation after every tiling. :param prefer_partial_parallelism: If set, prefers extracting non-conflicted

map dimensions over tiling WCR map (may not perform well if parallel dimensions are small).

Note:	This function operates in-place.

Module contents¶

This module initializes the auto-optimization transformations package.