dace.transformation.auto package

Submodules

dace.transformation.auto.auto_optimize module

Automatic optimization routines for SDFGs.

dace.transformation.auto.auto_optimize.apply_gpu_storage(sdfg)

Changes the storage of the SDFG’s input and output data to GPU global memory.

Return type:

None

dace.transformation.auto.auto_optimize.auto_optimize(sdfg, device, validate=True, validate_all=False, symbols=None, use_gpu_storage=False)

Runs a basic sequence of transformations to optimize a given SDFG to decent performance. In particular, performs the following:

  • Simplify

  • Auto-parallelization (loop-to-map)

  • Greedy application of SubgraphFusion

  • Tiled write-conflict resolution (MapTiling -> AccumulateTransient)

  • Tiled stream accumulation (MapTiling -> AccumulateTransient)

  • Collapse all maps to parallelize across all dimensions

  • Set all library nodes to expand to fast expansion, which calls the fastest library on the target device

Parameters:
  • sdfg (SDFG) – The SDFG to optimize.

  • device (DeviceType) – the device to optimize for.

  • validate (bool) – If True, validates the SDFG after all transformations have been applied.

  • validate_all (bool) – If True, validates the SDFG after every step.

  • symbols (Optional[Dict[str, int]]) – Optional dict that maps symbols (str/symbolic) to int/float

  • use_gpu_storage (bool) – If True, changes the storage of non-transient data to GPU global memory.

Return type:

SDFG

Returns:

The optimized SDFG.

Note:

Operates in-place on the given SDFG.

Note:

This function is still experimental and may harm correctness in certain cases. Please report an issue if it does.

dace.transformation.auto.auto_optimize.find_fast_library(device)
Return type:

List[str]

dace.transformation.auto.auto_optimize.greedy_fuse(graph_or_subgraph, validate_all, device=DeviceType.CPU, recursive=True, stencil=False, stencil_tile=None, permutations_only=True, expand_reductions=False)

Greedily fuses maps of an SDFG or graph, operating in-place.

Parameters:
  • graph_or_subgraph (Union[SDFG, SDFGState, SubgraphView]) – SDFG, SDFGState or Subgraph

  • validate_all (bool) – Validate SDFG or graph at each fusion step

  • device (DeviceType) – Device type to specialize for

  • recursive (bool) – Fuse recursively within (fused and unfused) scopes

  • stencil (bool) – Perform stencil fusion instead of regular fusion

  • stencil_tile – StencilTiling Tile size, default if None

  • permutations_only (bool) – Disallow splitting of maps during MultiExpansion stage

  • expand_reductions (bool) – Expand all reduce nodes before fusion

Return type:

None

dace.transformation.auto.auto_optimize.make_transients_persistent(sdfg, device, toplevel_only=True)

Helper function to change several storage and scheduling properties

  • Makes non-view array lifetimes persistent, with some restrictions depending on the device

  • Reset nonatomic WCR edges on GPU

The only arrays that are made persistent by default are ones that do not exist inside a scope (and thus may be allocated multiple times), and whose symbols are always given as parameters to the SDFG (so that they can be allocated in a persistent manner).

Parameters:
  • sdfg (SDFG) – SDFG

  • device (DeviceType) – Device type

  • toplevel_only (bool) – If True, only converts access nodes that do not appear in any scope.

Return type:

Dict[int, Set[str]]

Returns:

A dictionary mapping SDFG IDs to a set of transient arrays that were made persistent.

dace.transformation.auto.auto_optimize.move_small_arrays_to_stack(sdfg)

Set all Default storage types that are constant sized and less than the auto-tile size to the stack (as StorageType.Register).

Parameters:

sdfg (SDFG) – The SDFG to operate on.

Note:

Operates in-place on the SDFG.

Return type:

None

dace.transformation.auto.auto_optimize.set_fast_implementations(sdfg, device, blocklist=None)

Set fast library node implementations for the given device

Parameters:
  • sdfg (SDFG) – The SDFG to optimize.

  • device (DeviceType) – the device to optimize for.

  • blocklist (Optional[List[str]]) – list of disallowed implementations.

Note:

Operates in-place on the given SDFG.

dace.transformation.auto.auto_optimize.tile_wcrs(graph_or_subgraph, validate_all, prefer_partial_parallelism=None)

Tiles parallel write-conflict resolution maps in an SDFG, state, or subgraphs thereof. Reduces the number of atomic operations by tiling and introducing transient arrays to accumulate atomics on.

Parameters:
  • graph_or_subgraph (Union[SDFG, SDFGState, SubgraphView]) – The SDFG/state/subgraph to optimize within.

  • validate_all (bool) – If True, runs SDFG validation after every tiling.

  • prefer_partial_parallelism (Optional[bool]) – If set, prefers extracting non-conflicted map dimensions over tiling WCR map (may not perform well if parallel dimensions are small).

Note:

This function operates in-place.

Return type:

None

Module contents

This module initializes the auto-optimization transformations package.

dace.transformation.auto.fpga module

FPGA-Oriented Automatic optimization routines for SDFGs.

dace.transformation.auto.fpga.fpga_global_to_local(sdfg, max_size=1048576)

Takes an entire SDFG and changes the storage type of a global FPGA data container to Local in the following situation: - the data is transient, - the data is not a transient shared with other states, and - the data has a compile-time known size. :type sdfg: SDFG :param sdfg: The SDFG to operate on. It must be a top-level SDFG. :type max_size: int :param max_size: maximum size (in bytes) that a container can have to be considered for storage type change :note: Operates in-place on the SDFG.

Return type:

None

dace.transformation.auto.fpga.fpga_rr_interleave_containers_to_banks(sdfg, num_banks=4, memory_type='DDR')

Allocates the (global) arrays to FPGA off-chip memory banks, interleaving them in a Round-Robin (RR) fashion. This applies to all the arrays in the SDFG hierarchy.

Parameters:
  • sdfg (SDFG) – The SDFG to operate on.

  • num_banks (int) – number of off-chip memory banks to consider

  • memory_type (str) – type of off-chip memory, either “DDR” or “HBM” (if the target FPGA supports it)

Returns:

a list containing the number of (transient) arrays allocated to each bank

Note:

Operates in-place on the SDFG.