dace.transformation.auto package
Submodules
dace.transformation.auto.auto_optimize module
Automatic optimization routines for SDFGs.
- dace.transformation.auto.auto_optimize.apply_gpu_storage(sdfg)
Changes the storage of the SDFG’s input and output data to GPU global memory.
- Return type:
None
- dace.transformation.auto.auto_optimize.auto_optimize(sdfg, device, validate=True, validate_all=False, symbols=None, use_gpu_storage=False)
Runs a basic sequence of transformations to optimize a given SDFG to decent performance. In particular, performs the following:
Simplify
Auto-parallelization (loop-to-map)
Greedy application of SubgraphFusion
Tiled write-conflict resolution (MapTiling -> AccumulateTransient)
Tiled stream accumulation (MapTiling -> AccumulateTransient)
Collapse all maps to parallelize across all dimensions
Set all library nodes to expand to
fast
expansion, which calls the fastest library on the target device
- Parameters:
sdfg (
SDFG
) – The SDFG to optimize.device (
DeviceType
) – the device to optimize for.validate (
bool
) – If True, validates the SDFG after all transformations have been applied.validate_all (
bool
) – If True, validates the SDFG after every step.symbols (
Optional
[Dict
[str
,int
]]) – Optional dict that maps symbols (str/symbolic) to int/floatuse_gpu_storage (
bool
) – If True, changes the storage of non-transient data to GPU global memory.
- Return type:
- Returns:
The optimized SDFG.
- Note:
Operates in-place on the given SDFG.
- Note:
This function is still experimental and may harm correctness in certain cases. Please report an issue if it does.
- dace.transformation.auto.auto_optimize.find_fast_library(device)
- Return type:
List
[str
]
- dace.transformation.auto.auto_optimize.greedy_fuse(graph_or_subgraph, validate_all, device=DeviceType.CPU, recursive=True, stencil=False, stencil_tile=None, permutations_only=True, expand_reductions=False)
Greedily fuses maps of an SDFG or graph, operating in-place.
- Parameters:
graph_or_subgraph (
Union
[SDFG
,SDFGState
,SubgraphView
]) – SDFG, SDFGState or Subgraphvalidate_all (
bool
) – Validate SDFG or graph at each fusion stepdevice (
DeviceType
) – Device type to specialize forrecursive (
bool
) – Fuse recursively within (fused and unfused) scopesstencil (
bool
) – Perform stencil fusion instead of regular fusionstencil_tile – StencilTiling Tile size, default if None
permutations_only (
bool
) – Disallow splitting of maps during MultiExpansion stageexpand_reductions (
bool
) – Expand all reduce nodes before fusion
- Return type:
None
- dace.transformation.auto.auto_optimize.make_transients_persistent(sdfg, device, toplevel_only=True)
Helper function to change several storage and scheduling properties
Makes non-view array lifetimes persistent, with some restrictions depending on the device
Reset nonatomic WCR edges on GPU
The only arrays that are made persistent by default are ones that do not exist inside a scope (and thus may be allocated multiple times), and whose symbols are always given as parameters to the SDFG (so that they can be allocated in a persistent manner).
- Parameters:
sdfg (
SDFG
) – SDFGdevice (
DeviceType
) – Device typetoplevel_only (
bool
) – If True, only converts access nodes that do not appear in any scope.
- Return type:
Dict
[int
,Set
[str
]]- Returns:
A dictionary mapping SDFG IDs to a set of transient arrays that were made persistent.
- dace.transformation.auto.auto_optimize.move_small_arrays_to_stack(sdfg)
Set all Default storage types that are constant sized and less than the auto-tile size to the stack (as StorageType.Register).
- Parameters:
sdfg (
SDFG
) – The SDFG to operate on.- Note:
Operates in-place on the SDFG.
- Return type:
None
- dace.transformation.auto.auto_optimize.set_fast_implementations(sdfg, device, blocklist=None)
Set fast library node implementations for the given device
- Parameters:
sdfg (
SDFG
) – The SDFG to optimize.device (
DeviceType
) – the device to optimize for.blocklist (
Optional
[List
[str
]]) – list of disallowed implementations.
- Note:
Operates in-place on the given SDFG.
- dace.transformation.auto.auto_optimize.tile_wcrs(graph_or_subgraph, validate_all, prefer_partial_parallelism=None)
Tiles parallel write-conflict resolution maps in an SDFG, state, or subgraphs thereof. Reduces the number of atomic operations by tiling and introducing transient arrays to accumulate atomics on.
- Parameters:
graph_or_subgraph (
Union
[SDFG
,SDFGState
,SubgraphView
]) – The SDFG/state/subgraph to optimize within.validate_all (
bool
) – If True, runs SDFG validation after every tiling.prefer_partial_parallelism (
Optional
[bool
]) – If set, prefers extracting non-conflicted map dimensions over tiling WCR map (may not perform well if parallel dimensions are small).
- Note:
This function operates in-place.
- Return type:
None
Module contents
This module initializes the auto-optimization transformations package.
dace.transformation.auto.fpga module
FPGA-Oriented Automatic optimization routines for SDFGs.
- dace.transformation.auto.fpga.fpga_global_to_local(sdfg, max_size=1048576)
Takes an entire SDFG and changes the storage type of a global FPGA data container to Local in the following situation: - the data is transient, - the data is not a transient shared with other states, and - the data has a compile-time known size. :type sdfg:
SDFG
:param sdfg: The SDFG to operate on. It must be a top-level SDFG. :type max_size:int
:param max_size: maximum size (in bytes) that a container can have to be considered for storage type change :note: Operates in-place on the SDFG.- Return type:
None
- dace.transformation.auto.fpga.fpga_rr_interleave_containers_to_banks(sdfg, num_banks=4, memory_type='DDR')
Allocates the (global) arrays to FPGA off-chip memory banks, interleaving them in a Round-Robin (RR) fashion. This applies to all the arrays in the SDFG hierarchy.
- Parameters:
sdfg (
SDFG
) – The SDFG to operate on.num_banks (
int
) – number of off-chip memory banks to considermemory_type (
str
) – type of off-chip memory, either “DDR” or “HBM” (if the target FPGA supports it)
- Returns:
a list containing the number of (transient) arrays allocated to each bank
- Note:
Operates in-place on the SDFG.