Profiling and Instrumentation
For more information and examples, see the Benchmarking and Instrumentation tutorial.
Full-program profiling of Python DaCe programs can be performed by using the
(or from the console via daceprof - Profiler and Report Viewer).
This simple profiling mode is performed within Python using timers, calling the same program for a configurable number
of times and printing the median execution time. It is not as accurate as the other profiling modes, but it is easy to
use and does not require any additional tools.
Every time an SDFG is invoked, a profiling report JSON file (see below) will be generated. You can also use the profiling
results directly in Python with the
as keyword. After the profiling section is complete, each SDFG and its timings will
be stored in a list.
For example, the following code will print the execution time of the
my_function function after running it 100 times,
with 10 steps of warmup (where execution time is ignored):
import dace import numpy as np @dace.program def my_function(A: dace.float64): return A + 1 A = np.random.rand(10000) with dace.profile(repetitions=100, warmup=10) as prof: # Enable profiling my_function(A) # Optionally, the following code will print each individual time of the first call sdfg, timing = prof.times print(timing)
This mode executes the same program multiple times. If the output would be affected by this (e.g., if an array is
incremented), either use
repetitions=1 or use the Instrumentation mode.
Instrumentation is a more accurate profiling mode that generates specific measurement code on an SDFG or any sub-part
of it (for example, a single Map). When the SDFG is called, the instrumentation API generates a JSON file for each
execution, containing the measured metrics (see file format below) and places it in the
The instrumentation API can be used by setting
element.instrument to the desired instrumentation type (see
for a list of the default available types).
element can be almost any SDFG element, from the SDFG itself, through a
state, to a variety of nodes, such as a Map, a Tasklet, or a NestedSDFG. The generated report can then be read programmatically
InstrumentationReport object. The SDFG class provides the methods
get_instrumentation_reports() to read the last or
all generated reports, respectively. See
SDFG for more methods related to instrumentation reports.
A simple example use of SDFG instrumentation would be to mimic the simple profiling mode from above with a
Timer instrumentation applied on the whole SDFG:
import dace import numpy as np @dace.program def twomaps(A): B = np.sin(A) return B * 2.0 a = np.random.rand(1000, 1000) sdfg = twomaps.to_sdfg(a) sdfg.instrument = dace.InstrumentationType.Timer # Instrument the whole SDFG sdfg(a) # Print the execution time in a human-readable tabular format report = sdfg.get_latest_report() print(report)
More in-depth instrumentation can be performed by applying instrumentation to specific nodes. For example, the following code will instrument the individual Map scopes in the above application:
# Instrument the individual Map scopes for state in sdfg.nodes(): for node in state.nodes(): if isinstance(node, dace.nodes.MapEntry): node.instrument = dace.InstrumentationType.Timer # The report will now contain information on each individual map. Example printout: # Instrumentation report # SDFG Hash: 0f02b642249b861dc94b7cbc729190d4b27cab79607b8f28c7de3946e62d5977 # --------------------------------------------------------------------------- # Element Runtime (ms) # Min Mean Median Max # --------------------------------------------------------------------------- # SDFG (0) # |-State (0) # | |-Node (0) # | | |Map _numpy_sin__map: # | | | 11.654 11.654 11.654 11.654 # | |-Node (5) # | | |Map _Mult__map: # | | | 1.524 1.524 1.524 1.524 # ---------------------------------------------------------------------------
There are more instrumentation types available, such as fine-grained GPU kernel timing with
Instrumentation can also collect performance counters on CPUs and GPUs using LIKWID.
LIKWID_Counters instrumentation type can be configured to collect
a wide variety of performance counters on CPUs and GPUs. An example use can be found in the
LIKWID instrumentation code sample.
Instrumentation file format
Instrumentation uses a JSON file in the Chrome Trace Event format to store the collected metrics. You can view it in several ways:
In the Visual Studio Code extension, laid on top of a program (see Runtime Analysis and Instrumentation for an example)
Printed out in the console with daceprof - Profiler and Report Viewer
Similarly to timing events, data containers and their contents can be serialized for performance and validation
reproducibility purposes. This is done by setting the
instrument property of an
DataInstrumentationType, such as
The data will be serialized (keeping each version if the access node is encountered multiple times) in the
.dacecache/<program name>/data directory. The data can then be reloaded in subsequent executions by setting the
instrument property to
This feature is crucial for reproducibility and validation purposes, as it allows to run the same program multiple times
with the same input data, and compare the output data to the original output data. Data instrumentation powers cutout-based
CutoutTuner), which looks at subsets of a program at a time.
The folder structure of a data report is as follows:
.dacecache/<program name>/data/<array name>/<uuid>_<version>.bin,
<array name> is the data container name in the SDFG,
<uuid> is a unique identifier to the access node from
which this array was saved, and
<version> is a running number for the currently-saved array (e.g., when an access node
is written to multiple times in a loop).
The instrumented data report can be read in the Python API via the
class, which can be obtained by calling
get_instrumented_data() on the SDFG object.
The files themselves are direct binary representations of the whole data (with padding and strides), for complete
reproducibility. When accessed from Python, a numpy wrapper shows the user-accessible view of that array.
Example of creating and reading such a report is as follows:
@dace.program def data_instrumentation(A: dace.float64[1000, 1000]): versioned = np.zeros_like(A) for i in range(10): versioned += A return versioned sdfg = data_instrumentation.to_sdfg() # ... Set instrument to Save on the AccessNodes and run the SDFG ... dreport = sdfg.get_instrumented_data() # Returns an InstrumentedDataReport print(dreport.keys()) # Will print "'A', 'versioned'" array = dreport['A'] # return value is a single array if there is only one version varrays = dreport['versioned'] # otherwise, return value is a sorted list of versions # after loading, arrays can be used normally with numpy assert np.allclose(array, real_A) for arr in varrays: print(arr[5, :])