Optimizing Programs
Once created, SDFGs can be optimized by a variety of means.
The experimental automatic optimization heuristics provides a good (but likely not optimal) starting point. Be sure to install and use fast libraries, if they exist for your hardware.
Manual optimization can then provide more fine-grained control over the performance aspects of the code. To determine which optimizations work best, SDFGs can be analyzed for performance bottlenecks using Profiling and Instrumentation, or visually through static and runtime analysis. With this information in hand, the SDFG can be optimized using transformations, passes, or auto-tuning.
The most common type of optimization is local transformations. For example, the
InlineSDFG
transformation inlines a nested SDFG
into its parent SDFG. This transformation is applied to a single SDFG, and does not require any information from other SDFGs.
As opposed to local transformations, passes are globally applied on an SDFG, are can be used to perform
whole-program analysis and optimization, such as memory footprint reduction in TransientReuse
.
Transforming a program is often a sequence of iterative operations.
Some transformations do not necessarily improve the performance of an SDFG, but are a “stepping stone” for other
transformations, for example MapTiling
on a map can lead to
InLocalStorage
being available on the memlets.
When working with specific platforms, be sure to read the Best Practices documentation entries linked below. It is also
recommended to read vendor-provided documentation on how to maximize performance on that platform.
Finally, our experimental auto-tuning API allows for automatic optimization of SDFGs by searching over the set of possible
configurations. This is done by evaluating the performance of each configuration and selecting the best one.
For example, MapPermutationTuner
automatically tunes the order of
multi-dimensional maps for the best performance, and DataLayoutTuner
globally
tunes the data layout of arrays.
The following resources are available to help you optimize your SDFG:
Using transformations: Using and Creating Transformations
Creating optimized schedules that can match optimized libraries: Matrix multiplication CPU and GPU optimization example
Auto-tuning and instrumentation: Tuning data layouts sample
The following subsections provide more information on the different types of optimization methods: