dace.transformation.auto package¶
Submodules¶
dace.transformation.auto.auto_optimize module¶
Automatic optimization routines for SDFGs.
-
dace.transformation.auto.auto_optimize.
auto_optimize
(sdfg: dace.sdfg.sdfg.SDFG, device: dace.dtypes.DeviceType, validate: bool = True, validate_all: bool = False, symbols: Dict[str, int] = None) → dace.sdfg.sdfg.SDFG¶ Runs a basic sequence of transformations to optimize a given SDFG to decent performance. In particular, performs the following:
- Simplify
- Auto-parallelization (loop-to-map)
- Greedy application of SubgraphFusion
- Tiled write-conflict resolution (MapTiling -> AccumulateTransient)
- Tiled stream accumulation (MapTiling -> AccumulateTransient)
- Collapse all maps to parallelize across all dimensions
- Set all library nodes to expand to
fast
expansion, which calls the fastest library on the target device
Parameters: - sdfg – The SDFG to optimize.
- device – the device to optimize for.
- validate – If True, validates the SDFG after all transformations have been applied.
- validate_all – If True, validates the SDFG after every step.
- symbols – Optional dict that maps symbols (str/symbolic) to int/float
Returns: The optimized SDFG.
Note: Operates in-place on the given SDFG.
Note: This function is still experimental and may harm correctness in certain cases. Please report an issue if it does.
-
dace.transformation.auto.auto_optimize.
find_fast_library
(device: dace.dtypes.DeviceType) → List[str]¶
-
dace.transformation.auto.auto_optimize.
greedy_fuse
(graph_or_subgraph: Union[dace.sdfg.sdfg.SDFG, dace.sdfg.state.SDFGState, dace.sdfg.graph.SubgraphView], validate_all: bool, device: dace.dtypes.DeviceType = <DeviceType.CPU: 1>, recursive: bool = True, stencil: bool = False, stencil_tile=None, permutations_only: bool = True, expand_reductions: bool = False) → None¶ Greedily fuses maps of an SDFG or graph, operating in-place. :param graph_or_subgraph: SDFG, SDFGState or Subgraph :param validate_all: Validate SDFG or graph at each fusion step :param device: Device type to specialize for :param recursive: Fuse recursively within (fused and unfused) scopes :param stencil: Perform stencil fusion instead of regular fusion :param stencil_tile: StencilTiling Tile size, default if None :param permutations_only: Disallow splitting of maps during MultiExpansion stage :param expand_reductions: Expand all reduce nodes before fusion
-
dace.transformation.auto.auto_optimize.
make_transients_persistent
(sdfg: dace.sdfg.sdfg.SDFG, device: dace.dtypes.DeviceType, toplevel_only: bool = True) → None¶ Helper function to change several storage and scheduling properties - Makes non-view array lifetimes persistent, with some
restrictions depending on the device- Reset nonatomic WCR edges on GPU
The only arrays that are made persistent by default are ones that do not exist inside a scope (and thus may be allocated multiple times), and whose symbols are always given as parameters to the SDFG (so that they can be allocated in a persistent manner).
Parameters: - sdfg – SDFG
- device – Device type
- toplevel_only – If True, only converts access nodes that do not appear in any scope.
-
dace.transformation.auto.auto_optimize.
move_small_arrays_to_stack
(sdfg: dace.sdfg.sdfg.SDFG) → None¶ Set all Default storage types that are constant sized and less than the auto-tile size to the stack (as StorageType.Register). :param sdfg: The SDFG to operate on. :note: Operates in-place on the SDFG.
-
dace.transformation.auto.auto_optimize.
set_fast_implementations
(sdfg: dace.sdfg.sdfg.SDFG, device: dace.dtypes.DeviceType, blocklist: List[str] = None)¶ Set fast library node implementations for the given device
Parameters: - sdfg – The SDFG to optimize.
- device – the device to optimize for.
- blocklist – list of disallowed implementations.
Note: Operates in-place on the given SDFG.
-
dace.transformation.auto.auto_optimize.
tile_wcrs
(graph_or_subgraph: Union[dace.sdfg.sdfg.SDFG, dace.sdfg.state.SDFGState, dace.sdfg.graph.SubgraphView], validate_all: bool, prefer_partial_parallelism: bool = None) → None¶ Tiles parallel write-conflict resolution maps in an SDFG, state, or subgraphs thereof. Reduces the number of atomic operations by tiling and introducing transient arrays to accumulate atomics on. :param graph_or_subgraph: The SDFG/state/subgraph to optimize within. :param validate_all: If True, runs SDFG validation after every tiling. :param prefer_partial_parallelism: If set, prefers extracting non-conflicted
map dimensions over tiling WCR map (may not perform well if parallel dimensions are small).Note: This function operates in-place.
Module contents¶
This module initializes the auto-optimization transformations package.