dace.transformation.auto package

Submodules

dace.transformation.auto.auto_optimize module

Automatic optimization routines for SDFGs.

dace.transformation.auto.auto_optimize.auto_optimize(sdfg: dace.sdfg.sdfg.SDFG, device: dace.dtypes.DeviceType, validate: bool = True, validate_all: bool = False, symbols: Dict[str, int] = None) → dace.sdfg.sdfg.SDFG

Runs a basic sequence of transformations to optimize a given SDFG to decent performance. In particular, performs the following:

  • Simplify
  • Auto-parallelization (loop-to-map)
  • Greedy application of SubgraphFusion
  • Tiled write-conflict resolution (MapTiling -> AccumulateTransient)
  • Tiled stream accumulation (MapTiling -> AccumulateTransient)
  • Collapse all maps to parallelize across all dimensions
  • Set all library nodes to expand to fast expansion, which calls the fastest library on the target device
Parameters:
  • sdfg – The SDFG to optimize.
  • device – the device to optimize for.
  • validate – If True, validates the SDFG after all transformations have been applied.
  • validate_all – If True, validates the SDFG after every step.
  • symbols – Optional dict that maps symbols (str/symbolic) to int/float
Returns:

The optimized SDFG.

Note:

Operates in-place on the given SDFG.

Note:

This function is still experimental and may harm correctness in certain cases. Please report an issue if it does.

dace.transformation.auto.auto_optimize.find_fast_library(device: dace.dtypes.DeviceType) → List[str]
dace.transformation.auto.auto_optimize.greedy_fuse(graph_or_subgraph: Union[dace.sdfg.sdfg.SDFG, dace.sdfg.state.SDFGState, dace.sdfg.graph.SubgraphView], validate_all: bool, device: dace.dtypes.DeviceType = <DeviceType.CPU: 1>, recursive: bool = True, stencil: bool = False, stencil_tile=None, permutations_only: bool = True, expand_reductions: bool = False) → None

Greedily fuses maps of an SDFG or graph, operating in-place. :param graph_or_subgraph: SDFG, SDFGState or Subgraph :param validate_all: Validate SDFG or graph at each fusion step :param device: Device type to specialize for :param recursive: Fuse recursively within (fused and unfused) scopes :param stencil: Perform stencil fusion instead of regular fusion :param stencil_tile: StencilTiling Tile size, default if None :param permutations_only: Disallow splitting of maps during MultiExpansion stage :param expand_reductions: Expand all reduce nodes before fusion

dace.transformation.auto.auto_optimize.make_transients_persistent(sdfg: dace.sdfg.sdfg.SDFG, device: dace.dtypes.DeviceType, toplevel_only: bool = True) → Dict[int, Set[str]]

Helper function to change several storage and scheduling properties - Makes non-view array lifetimes persistent, with some

restrictions depending on the device
  • Reset nonatomic WCR edges on GPU

The only arrays that are made persistent by default are ones that do not exist inside a scope (and thus may be allocated multiple times), and whose symbols are always given as parameters to the SDFG (so that they can be allocated in a persistent manner).

Parameters:
  • sdfg – SDFG
  • device – Device type
  • toplevel_only – If True, only converts access nodes that do not appear in any scope.
Returns:

A dictionary mapping SDFG IDs to a set of transient arrays that were made persistent.

dace.transformation.auto.auto_optimize.move_small_arrays_to_stack(sdfg: dace.sdfg.sdfg.SDFG) → None

Set all Default storage types that are constant sized and less than the auto-tile size to the stack (as StorageType.Register). :param sdfg: The SDFG to operate on. :note: Operates in-place on the SDFG.

dace.transformation.auto.auto_optimize.set_fast_implementations(sdfg: dace.sdfg.sdfg.SDFG, device: dace.dtypes.DeviceType, blocklist: List[str] = None)

Set fast library node implementations for the given device

Parameters:
  • sdfg – The SDFG to optimize.
  • device – the device to optimize for.
  • blocklist – list of disallowed implementations.
Note:

Operates in-place on the given SDFG.

dace.transformation.auto.auto_optimize.tile_wcrs(graph_or_subgraph: Union[dace.sdfg.sdfg.SDFG, dace.sdfg.state.SDFGState, dace.sdfg.graph.SubgraphView], validate_all: bool, prefer_partial_parallelism: bool = None) → None

Tiles parallel write-conflict resolution maps in an SDFG, state, or subgraphs thereof. Reduces the number of atomic operations by tiling and introducing transient arrays to accumulate atomics on. :param graph_or_subgraph: The SDFG/state/subgraph to optimize within. :param validate_all: If True, runs SDFG validation after every tiling. :param prefer_partial_parallelism: If set, prefers extracting non-conflicted

map dimensions over tiling WCR map (may not perform well if parallel dimensions are small).
Note:This function operates in-place.

Module contents

This module initializes the auto-optimization transformations package.