.. _profiling: Profiling and Instrumentation ============================= .. note:: For more information and examples, see the `Benchmarking and Instrumentation `_ tutorial. Simple profiling ---------------- Full-program profiling of Python DaCe programs can be performed by using the :func:`~dace.builtin_hooks.profile` hook (or from the console via :ref:`daceprof`). This simple profiling mode is performed within Python using timers, calling the same program for a configurable number of times and printing the median execution time. It is not as accurate as the other profiling modes, but it is easy to use and does not require any additional tools. Every time an SDFG is invoked, a profiling report JSON file (see below) will be generated. You can also use the profiling results directly in Python with the ``as`` keyword. After the profiling section is complete, each SDFG and its timings will be stored in a list. For example, the following code will print the execution time of the ``my_function`` function after running it 100 times, with 10 steps of warmup (where execution time is ignored): .. code-block:: python import dace import numpy as np @dace.program def my_function(A: dace.float64[10000]): return A + 1 A = np.random.rand(10000) with dace.profile(repetitions=100, warmup=10) as prof: # Enable profiling my_function(A) # Optionally, the following code will print each individual time of the first call sdfg, timing = prof.times[0] print(timing) .. note:: This mode executes the same program multiple times. If the output would be affected by this (e.g., if an array is incremented), either use ``repetitions=1`` or use the :ref:`instrumentation` mode. .. _instrumentation: Instrumentation --------------- Instrumentation is a more accurate profiling mode that generates specific measurement code on an SDFG or any sub-part of it (for example, a single Map). When the SDFG is called, the instrumentation API generates a JSON file for each execution, containing the measured metrics (see file format below) and places it in the ``.dacecache//perf`` directory. The instrumentation API can be used by setting ``element.instrument`` to the desired instrumentation type (see :class:`~dace.dtypes.InstrumentationType` for a list of the default available types). ``element`` can be almost any SDFG element, from the SDFG itself, through a state, to a variety of nodes, such as a Map, a Tasklet, or a NestedSDFG. The generated report can then be read programmatically as a :class:`~dace.codegen.instrumentation.report.InstrumentationReport` object. The SDFG class provides the methods :func:`~dace.sdfg.sdfg.SDFG.get_latest_report` and :func:`~dace.sdfg.sdfg.SDFG.get_instrumentation_reports` to read the last or all generated reports, respectively. See :class:`~dace.sdfg.sdfg.SDFG` for more methods related to instrumentation reports. A simple example use of SDFG instrumentation would be to mimic the simple profiling mode from above with a :class:`~dace.dtypes.InstrumentationType.Timer` instrumentation applied on the whole SDFG: .. code-block:: python import dace import numpy as np @dace.program def twomaps(A): B = np.sin(A) return B * 2.0 a = np.random.rand(1000, 1000) sdfg = twomaps.to_sdfg(a) sdfg.instrument = dace.InstrumentationType.Timer # Instrument the whole SDFG sdfg(a) # Print the execution time in a human-readable tabular format report = sdfg.get_latest_report() print(report) More in-depth instrumentation can be performed by applying instrumentation to specific nodes. For example, the following code will instrument the individual Map scopes in the above application: .. code-block:: python # Instrument the individual Map scopes for state in sdfg.nodes(): for node in state.nodes(): if isinstance(node, dace.nodes.MapEntry): node.instrument = dace.InstrumentationType.Timer # The report will now contain information on each individual map. Example printout: # Instrumentation report # SDFG Hash: 0f02b642249b861dc94b7cbc729190d4b27cab79607b8f28c7de3946e62d5977 # --------------------------------------------------------------------------- # Element Runtime (ms) # Min Mean Median Max # --------------------------------------------------------------------------- # SDFG (0) # |-State (0) # | |-Node (0) # | | |Map _numpy_sin__map: # | | | 11.654 11.654 11.654 11.654 # | |-Node (5) # | | |Map _Mult__map: # | | | 1.524 1.524 1.524 1.524 # --------------------------------------------------------------------------- There are more instrumentation types available, such as fine-grained GPU kernel timing with :class:`~dace.dtypes.InstrumentationType.GPU_Events`. Instrumentation can also collect performance counters on CPUs and GPUs using `LIKWID `_. The :class:`~dace.dtypes.InstrumentationType.LIKWID_Counters` instrumentation type can be configured to collect a wide variety of performance counters on CPUs and GPUs. An example use can be found in the `LIKWID instrumentation code sample `_. The :class:`~dace.dtypes.InstrumentationType.GPU_TX_MARKERS` instrumentation type wraps a DaCe program executed on the GPU with NVTX or rocTX markers. Important parts of the execution of the program on the GPU as the different states, SDFGs and initialization and finalization phases are marked with these markers. These markers can be used to visualize and measure the GPU activity using the NVIDIA Nsight Systems or ROCm Systems profilers. Instrumentation file format ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Instrumentation uses a JSON file in the `Chrome Trace Event `_ format to store the collected metrics. You can view it in several ways: * In the Visual Studio Code extension, laid on top of a program (see :ref:`vscode_trace` for an example) * Separately, using `Speedscope `_ (or, if you have Google Chrome, a viewer can also be accessed locally at ``_) * Printed out in the console with :ref:`daceprof` Data Instrumentation ~~~~~~~~~~~~~~~~~~~~ Similarly to timing events, data containers and their contents can be serialized for performance and validation reproducibility purposes. This is done by setting the ``instrument`` property of an :class:`~dace.sdfg.nodes.AccessNode` to a :class:`~dace.dtypes.DataInstrumentationType`, such as :class:`~dace.dtypes.DataInstrumentationType.Save`. The data will be serialized (keeping each version if the access node is encountered multiple times) in the ``.dacecache//data`` directory. The data can then be reloaded in subsequent executions by setting the ``instrument`` property to :class:`~dace.dtypes.DataInstrumentationType.Restore`. This feature is crucial for reproducibility and validation purposes, as it allows to run the same program multiple times with the same input data, and compare the output data to the original output data. Data instrumentation powers cutout-based auto-tuning (:class:`~dace.optimization.cutout_tuner.CutoutTuner`), which looks at subsets of a program at a time. The folder structure of a data report is as follows: ``.dacecache//data//_.bin``, where ```` is the data container name in the SDFG, ```` is a unique identifier to the access node from which this array was saved, and ```` is a running number for the currently-saved array (e.g., when an access node is written to multiple times in a loop). The instrumented data report can be read in the Python API via the :class:`~dace.codegen.instrumentation.data.data_report.InstrumentedDataReport` class, which can be obtained by calling :func:`~dace.sdfg.sdfg.SDFG.get_instrumented_data` on the SDFG object. The files themselves are direct binary representations of the whole data (with padding and strides), for complete reproducibility. When accessed from Python, a numpy wrapper shows the user-accessible view of that array. Example of creating and reading such a report is as follows: .. code-block:: python @dace.program def data_instrumentation(A: dace.float64[1000, 1000]): versioned = np.zeros_like(A) for i in range(10): versioned += A return versioned sdfg = data_instrumentation.to_sdfg() # ... Set instrument to Save on the AccessNodes and run the SDFG ... dreport = sdfg.get_instrumented_data() # Returns an InstrumentedDataReport print(dreport.keys()) # Will print "'A', 'versioned'" array = dreport['A'] # return value is a single array if there is only one version varrays = dreport['versioned'] # otherwise, return value is a sorted list of versions # after loading, arrays can be used normally with numpy assert np.allclose(array, real_A) for arr in varrays: print(arr[5, :])