dace.transformation.auto package

Submodules

dace.transformation.auto.auto_optimize module

Automatic optimization routines for SDFGs.

dace.transformation.auto.auto_optimize.apply_gpu_storage(sdfg)

Changes the storage of the SDFG’s input and output data to GPU global memory.

Return type:: None

dace.transformation.auto.auto_optimize.auto_optimize(sdfg, device, validate=True, validate_all=False, symbols=None, use_gpu_storage=False, find_fast_library_fn=None)

Runs a basic sequence of transformations to optimize a given SDFG to decent performance. In particular, performs the following:

Simplify

Auto-parallelization (loop-to-map)

Greedy application of SubgraphFusion

Tiled write-conflict resolution (MapTiling -> AccumulateTransient)

Tiled stream accumulation (MapTiling -> AccumulateTransient)

Collapse all maps to parallelize across all dimensions

Set all library nodes to expand to fast expansion, which calls the fastest library on the target device

Parameters:

sdfg (SDFG) – The SDFG to optimize.
device (DeviceType) – the device to optimize for.
validate (bool) – If True, validates the SDFG after all transformations have been applied.
validate_all (bool) – If True, validates the SDFG after every step.
symbols (Dict[str, int]) – Optional dict that maps symbols (str/symbolic) to int/float
use_gpu_storage (bool) – If True, changes the storage of non-transient data to GPU global memory.
find_fast_library_fn (Callable[[DeviceType], List[str]]) – Optional function that returns the prioritized list of implementations for the given device, which will take priority over the existing set of fast libraries found using auto-optimize.

Return type:

SDFG

Returns:

The optimized SDFG.

Note:

Operates in-place on the given SDFG.

Note:

This function is still experimental and may harm correctness in certain cases. Please report an issue if it does.

dace.transformation.auto.auto_optimize.find_fast_library(device)

Return type:: List[str]

dace.transformation.auto.auto_optimize.greedy_fuse(graph_or_subgraph, validate_all, device=<DeviceType.CPU: 1>, recursive=True, stencil=False, stencil_tile=None, permutations_only=True, expand_reductions=False)

Greedily fuses maps of an SDFG or graph, operating in-place.

Parameters:

graph_or_subgraph (SDFG | SDFGState | SubgraphView | ControlFlowRegion) – SDFG, SDFGState or Subgraph
validate_all (bool) – Validate SDFG or graph at each fusion step
device (DeviceType) – Device type to specialize for
recursive (bool) – Fuse recursively within (fused and unfused) scopes
stencil (bool) – Perform stencil fusion instead of regular fusion
stencil_tile – StencilTiling Tile size, default if None
permutations_only (bool) – Disallow splitting of maps during MultiExpansion stage
expand_reductions (bool) – Expand all reduce nodes before fusion

Return type:

None

dace.transformation.auto.auto_optimize.make_transients_persistent(sdfg, device, toplevel_only=True)

Helper function to change several storage and scheduling properties

Makes non-view array lifetimes persistent, with some restrictions depending on the device

Reset nonatomic WCR edges on GPU

The only arrays that are made persistent by default are ones that do not exist inside a scope (and thus may be allocated multiple times), and whose symbols are always given as parameters to the SDFG (so that they can be allocated in a persistent manner).

Parameters:

sdfg (SDFG) – SDFG
device (DeviceType) – Device type
toplevel_only (bool) – If True, only converts access nodes that do not appear in any scope.

Return type:

Dict[int, Set[str]]

Returns:

A dictionary mapping SDFG IDs to a set of transient arrays that were made persistent.

dace.transformation.auto.auto_optimize.move_small_arrays_to_stack(sdfg)

Set all Default storage types that are constant sized and less than the auto-tile size to the stack (as StorageType.Register).

Parameters:: sdfg (SDFG) – The SDFG to operate on.
Note:: Operates in-place on the SDFG.
Return type:: None

dace.transformation.auto.auto_optimize.set_fast_implementations(sdfg, device, blocklist=None, find_fast_library_fn=None)

Set fast library node implementations for the given device

Parameters:

sdfg (SDFG) – The SDFG to optimize.
device (DeviceType) – the device to optimize for.
blocklist (List[str]) – list of disallowed implementations.
find_fast_library_fn (Callable[[DeviceType], List[str]]) – function that returns the prioritized list of implementations for the given device, which will take priority over the built-in find_fast_library function.

Note:

Operates in-place on the given SDFG.

Return type:

None

dace.transformation.auto.auto_optimize.tile_wcrs(graph_or_subgraph, validate_all, prefer_partial_parallelism=None)

Tiles parallel write-conflict resolution maps in an SDFG, state, or subgraphs thereof. Reduces the number of atomic operations by tiling and introducing transient arrays to accumulate atomics on.

Parameters:

graph_or_subgraph (SDFG | SDFGState | SubgraphView | ControlFlowRegion) – The SDFG/state/subgraph to optimize within.
validate_all (bool) – If True, runs SDFG validation after every tiling.
prefer_partial_parallelism (bool) – If set, prefers extracting non-conflicted map dimensions over tiling WCR map (may not perform well if parallel dimensions are small).

Note:

This function operates in-place.

Return type:

None

Module contents

This module initializes the auto-optimization transformations package.