There are several sources of potential issues with DaCe programs:
Frontend failures during parsing programs to SDFGs
SDFG validity before or during optimization
Transformations that impair the correctness of the SDFG
Failures during the Code Generation process
Segmentation faults or errors in the generated code
In general, DaCe tries to raise a Python exception and clearly print the origin of the issue. However, to shed more light
into the origin of the problem, it can be useful to set the
debugprint configuration entry to
There are several other important configuration entries: for frontend and debugging why Python functions become callbacks,
frontend.verbose_errors. For transformations that fail during matching, use
For issues with Properties and Serialization, enable
Below we provide a more detailed methodology for debugging particular issues. You can find common errors and solutions here.
SDFGs can be validated for soundness. This happens automatically during compilation, but can be triggered manually
dace.sdfg.sdfg.SDFG.validate() method. It can be useful to detect issues in the graph, examples include
Memlets that mismatch their context, out of bounds access, undefined symbol use, scopes that are not properly closed,
and many more.
On validation failure, (unless specified) a copy of the failing SDFG will be saved in the current working directory,
_dacegraphs/invalid.sdfg, which includes the source of the error. Opening it in the Visual Studio Code
extension even zooms in on the issue automatically!
Debugging and Recompiling Generated Code
For debugging the code generators and the generation process itself, see Debugging Code Generation.
If issues arise during compilation of the generated code, or the code is somehow incorrect, it can be useful to inspect and modify it.
The generated code of an SDFG is saved in the
.dacecache directory, under the name of the SDFG (which corresponds
to the function’s name in Python). You can inspect and modify the code by opening the file in your favorite text editor:
$ python my_program.py -- [Some failure happens] -- $ cd .dacecache/myprogram/src $ ls cpu cuda $ code cpu/myprogram.cpp
The generated code is organized in subdirectories for each target based on its code generator name, and in the above case there is both CPU and GPU code.
python my_program.py will overwrite the generated code.
There are two ways to avoid this overwriting. The first is to set the
regenerate_code flag on the DaCe program to
@dace.program(regenerate_code=False) def myprogram(...): ...
This will prevent the code from being regenerated, but it will cause DaCe to recompile the code. If you want to compile
the code yourself, you can set the
recompile flag to
@dace.program(recompile=False) def myprogram(...): ...
or set the configuration entry
1 to achieve the same effect globally (on each program).
Since this will prevent the code from being recompiled, you will need to manually go into the build directory and run
make to recompile the code:
$ cd .dacecache/myprogram/build $ make $ cd ../../.. # If recompile=False is used, the below environment variable is not necessary. $ DACE_compiler_use_cache=1 python my_program.py # Program will not be regenerated nor recompiled.
Runtime compilation issues
If there are issues with the C++ Runtime Headers, you can find their location and edit them manually:
# Print out the runtime folder $ python -c 'import dace; print(dace.__file__)' /home/user/.local/lib/python3.8/site-packages/dace/__init__.py # The files are in include/dace/*.h $ cd /home/user/.local/lib/python3.8/site-packages/dace/runtime
It is, however, recommended to install DaCe in development mode, so that you can edit the files directly in the source folder.
Crashes in Compiled Programs
Compiled programs are compiled to a shared object (
.dll file) that is linked to the host process. If using
a DaCe program within Python, debugging it requires simply calling any debugger (such as
gdb) on the Python process
and potentially setting breakpoints on the generated code (which can be found using the
gdb --args python myscript.py [args...]
In most cases, debugging in Release mode does not yield actionable results. To better debug compiled programs, set
compiler.build_type configuration entry to
Debug and rerun the program. The following example shows
a crashing program and how the process works:
import dace import numpy as np N = dace.symbol('N') @dace.program def example(a: dace.float32[N], b: dace.float32[N]): b = a n = 10 a = np.random.rand(n).astype(np.float32) b = np.random.rand(n).astype(np.float32) example(a, b) # Calling this function could trigger a segmentation fault
$ python example.py ... sh: segmentation fault python example.py $ gdb --args python example.py ... (gdb) r ... Thread 1 "python" received signal SIGSEGV, Segmentation fault. 0x00007fffe7259186 in __program_example_internal(example_t*, float*, float*, int) () from /path/.dacecache/example/build/libexample.so # No further information is given on the source of the issue. Below we set debug mode: $ DACE_compiler_build_type=Debug gdb --args python example.py ... (gdb) r ... Thread 1 "python" received signal SIGSEGV, Segmentation fault. 0x00007fffe7159186 in __program_example_internal (__state=0x5555574669a0, a=0x55555699efd0, b=0x555556f4c390, N=10) --Type <RET> for more, q to quit, c to continue without paging-- at /path/.dacecache/example/src/cpu/example.cpp:27 27 b = __out;
You can also use the Visual Studio Code extension to debug Python programs by using the
DaCe debugger debug provider.
It even supports mapping breakpoints from the Python code to the generated code.
For low-level access of the CMake configuration, you could also access the build folder, go to the
subdirectory, and call
ccmake . to modify it. After that run
make to rebuild.
GPU Debugging in DaCe
As GPU kernels cannot be debugged directly in
gdb, there are other tools that can be used to debug GPU programs.
The CUDA toolkit provides more tools to debug kernels:
cuda-gdb can break and debug CUDA kernels, and
can be used to track invalid memory accesses.
Additional debugging features in DaCe include GPU stream synchronization debugging. Since GPU toolkits (CUDA, HIP, OpenCL)
mostly run asynchronously using nonblocking calls, it is sometimes hard to pinpoint the source of an issue. Since GPU
programs can be large and run for a while,
Debug mode cannot always be enabled. For these reasons, DaCe provides
a mode that can run directly in
Release mode, called synchronous debugging. The mode inserts device-synchronization
calls after every GPU-related operation (kernel, library call) and checks for errors. This helps debug both crashes
and stream-related data races. Enable it by setting
compiler.cuda.syncdebug to True.
Transformation debugging can be used for multiple purposes: it can be used to understand why transformations fail to match on a specific subgraph, debug exceptions on matching, and failures during application of transformations.
By default, exceptions during transformation matching emit a warning. To debugging exceptions on matching, enable the
optimizer.match_exception configuration entry, which would turn them into errors.
If setting breakpoints, since transformations repeatedly try to apply on matching subgraphs on an SDFG, it is recommended to set conditional breakpoints including labels or any defining properties of the nodes/edges you want to debug the transformation for.
Another approach is to run the debugger on the Visual Studio Code extension’s optimizer daemon. The daemon is a Python script, so it can be debugged as such. Simply create a new debug configuration that starts the script (see Common issues with the Visual Studio Code extension on how to find the command) with the right port, kill the existing SDFG Optimizer, and debug the script. Breakpoints should now work inside DaCe or your custom transformations.
Debugging Frontend Issues
When debugging frontend issues, it is important to make the distinction between the frontend itself and transformations
applied on the initial SDFG. Thus, if there is a suspected issue in the frontend, first try disabling automatic simplification
optimizer.automatic_simplification config entry or the API, see below) and validating the initial
SDFG for soundness:
sdfg = bad_program.to_sdfg(simplify=False) sdfg.validate()
If this works but some programs fail, it might be a serialization issue. Try a save/load roundtrip:
sdfg.save('test.sdfg') sdfg = dace.SDFG.from_file('test.sdfg') sdfg.validate() # ...other validation methods...
Otherwise, the issue could be in the Simplify Pipeline. Try to simplify while validating every step:
This helps understanding which component causes the issue.