dace.transformation.auto package
Submodules
dace.transformation.auto.auto_optimize module
Automatic optimization routines for SDFGs.
- dace.transformation.auto.auto_optimize.apply_gpu_storage(sdfg)
Changes the storage of the SDFG’s input and output data to GPU global memory.
- Return type:
None
- dace.transformation.auto.auto_optimize.auto_optimize(sdfg, device, validate=True, validate_all=False, symbols=None, use_gpu_storage=False, find_fast_library_fn=None)
Runs a basic sequence of transformations to optimize a given SDFG to decent performance. In particular, performs the following:
Simplify
Auto-parallelization (loop-to-map)
Greedy application of SubgraphFusion
Tiled write-conflict resolution (MapTiling -> AccumulateTransient)
Tiled stream accumulation (MapTiling -> AccumulateTransient)
Collapse all maps to parallelize across all dimensions
Set all library nodes to expand to
fastexpansion, which calls the fastest library on the target device
- Parameters:
sdfg (
SDFG) – The SDFG to optimize.device (
DeviceType) – the device to optimize for.validate (
bool) – If True, validates the SDFG after all transformations have been applied.validate_all (
bool) – If True, validates the SDFG after every step.symbols (
Dict[str,int]) – Optional dict that maps symbols (str/symbolic) to int/floatuse_gpu_storage (
bool) – If True, changes the storage of non-transient data to GPU global memory.find_fast_library_fn (
Callable[[DeviceType],List[str]]) – Optional function that returns the prioritized list of implementations for the given device, which will take priority over the existing set of fast libraries found using auto-optimize.
- Return type:
- Returns:
The optimized SDFG.
- Note:
Operates in-place on the given SDFG.
- Note:
This function is still experimental and may harm correctness in certain cases. Please report an issue if it does.
- dace.transformation.auto.auto_optimize.find_fast_library(device)
- Return type:
List[str]
- dace.transformation.auto.auto_optimize.greedy_fuse(graph_or_subgraph, validate_all, device=<DeviceType.CPU: 1>, recursive=True, stencil=False, stencil_tile=None, permutations_only=True, expand_reductions=False)
Greedily fuses maps of an SDFG or graph, operating in-place.
- Parameters:
graph_or_subgraph (
SDFG|SDFGState|SubgraphView|ControlFlowRegion) – SDFG, SDFGState or Subgraphvalidate_all (
bool) – Validate SDFG or graph at each fusion stepdevice (
DeviceType) – Device type to specialize forrecursive (
bool) – Fuse recursively within (fused and unfused) scopesstencil (
bool) – Perform stencil fusion instead of regular fusionstencil_tile – StencilTiling Tile size, default if None
permutations_only (
bool) – Disallow splitting of maps during MultiExpansion stageexpand_reductions (
bool) – Expand all reduce nodes before fusion
- Return type:
None
- dace.transformation.auto.auto_optimize.make_transients_persistent(sdfg, device, toplevel_only=True)
Helper function to change several storage and scheduling properties
Makes non-view array lifetimes persistent, with some restrictions depending on the device
Reset nonatomic WCR edges on GPU
The only arrays that are made persistent by default are ones that do not exist inside a scope (and thus may be allocated multiple times), and whose symbols are always given as parameters to the SDFG (so that they can be allocated in a persistent manner).
- Parameters:
sdfg (
SDFG) – SDFGdevice (
DeviceType) – Device typetoplevel_only (
bool) – If True, only converts access nodes that do not appear in any scope.
- Return type:
Dict[int,Set[str]]- Returns:
A dictionary mapping SDFG IDs to a set of transient arrays that were made persistent.
- dace.transformation.auto.auto_optimize.move_small_arrays_to_stack(sdfg)
Set all Default storage types that are constant sized and less than the auto-tile size to the stack (as StorageType.Register).
- Parameters:
sdfg (
SDFG) – The SDFG to operate on.- Note:
Operates in-place on the SDFG.
- Return type:
None
- dace.transformation.auto.auto_optimize.set_fast_implementations(sdfg, device, blocklist=None, find_fast_library_fn=None)
Set fast library node implementations for the given device
- Parameters:
sdfg (
SDFG) – The SDFG to optimize.device (
DeviceType) – the device to optimize for.blocklist (
List[str]) – list of disallowed implementations.find_fast_library_fn (
Callable[[DeviceType],List[str]]) – function that returns the prioritized list of implementations for the given device, which will take priority over the built-infind_fast_libraryfunction.
- Note:
Operates in-place on the given SDFG.
- Return type:
None
- dace.transformation.auto.auto_optimize.tile_wcrs(graph_or_subgraph, validate_all, prefer_partial_parallelism=None)
Tiles parallel write-conflict resolution maps in an SDFG, state, or subgraphs thereof. Reduces the number of atomic operations by tiling and introducing transient arrays to accumulate atomics on.
- Parameters:
graph_or_subgraph (
SDFG|SDFGState|SubgraphView|ControlFlowRegion) – The SDFG/state/subgraph to optimize within.validate_all (
bool) – If True, runs SDFG validation after every tiling.prefer_partial_parallelism (
bool) – If set, prefers extracting non-conflicted map dimensions over tiling WCR map (may not perform well if parallel dimensions are small).
- Note:
This function operates in-place.
- Return type:
None
Module contents
This module initializes the auto-optimization transformations package.