Configuration entry reference
The following configuration entries are available in .dace.conf, as part of the API, and as environment variables.
See Configuring DaCe for more information on how to use the interface.
- cache
Compiled cache entry naming policy
Type:
strDescription: Determine the name of the generated
.dacecachefolder:
nameuses the name of the SDFG directly, causing it to be overridden by other programs using the same SDFG name.
hashuses a mangled name based on the hash of the SDFG, such that any change to the SDFG will generate a different cache folder.
uniqueuses a name based on the currently running Python process at code generation time, such that no caching or clashes can happen between different processes or subsequent invocations of Python.
singleuses a single cache folder for all SDFGs, saving space and potentially build time, but disallows executing SDFGs in parallel and caching of more than one simultaneous SDFG.Default value:
name
- call_hooks
Hooks before/after every DaCe program call
Type:
strDescription: A comma-separated list of functions (or Context Manager classes) that will be called before every DaCe program (SDFG) is compiled and run. Used for functionality such as automatic tuning or instrumentation.
Default value: (Empty)
- compiled_sdfg_call_hooks
Hooks before/after every compiled SDFG call
Type:
strDescription: A comma-separated list of functions (or Context Manager classes) that will be called before every compiled SDFG’s generated code is invoked. Used for functionality such as low-level profiling.
Default value: (Empty)
- debugprint
Debug printing
Type:
boolDescription: Enable verbose printouts.
Default value:
False
- default_build_folder
Default SDFG build folder
Type:
strDescription: Default folder in which compiled DaCe programs and SDFGs are stored. Can either be a relative path (by default) or absolute.
Default value:
.dacecache
- profiling
Profiling
Type:
boolDescription: Enable profiling support.
Default value:
False
- profiling_status
Status bar for profiling
Type:
boolDescription: Enable tqdm status bar while profiling. If tqdm is not installed a warning will appear. To disable this feature (and the warning) set this option to false.
Default value:
True
- progress
Progress reports
Type:
boolDescription: Enable progress report printouts.
Default value:
True
- store_history
Store SDFG transformation history
Type:
boolDescription: Store the history of transformations on the SDFG file.
Default value:
True
- treps
Profiling Repetitions
Type:
intDescription: Number of times to run program for profiling.
Default value:
100
compiler
Preferences of the compiler
- compiler.allow_shadowing
Allow variable shadowing
Type:
boolDescription: Allowing shadowing of variables in the code (reduces exceptions to warnings when shadowing is encountered).
Default value:
True
- compiler.allow_view_arguments
Allow numpy views as arguments
Type:
boolDescription: If true, allows users to call DaCe programs with NumPy views (for example, “A[:, 1]” or “w.T”). As this can create pointer aliasing issues with two arrays pointing to the same memory, or analyzability issue with strides and alignment, this option is disabled by default.
Default value:
False
- compiler.build_type
Build configuration
Type:
strDescription: Configuration type for CMake build (can be Debug, Release, RelWithDebInfo, or MinSizeRel).
Default value:
RelWithDebInfo
- compiler.codegen_lineinfo
Annotate code generator lines
Type:
boolDescription: Keep a source mapping between generated code and the file/line of the code generator that generated it. Used for debugging code generation.
Default value:
False
- compiler.codegen_state_struct_suffix
Suffix used by the code generator to mangle the state struct.
Type:
strDescription: For every SDFG the code generator is is processing a state struct is generated. The typename of this struct is derived by appending this value to the SDFG’s name. Note that the suffix may only contains letters, digits and underscores.
Default value:
_state_t
- compiler.default_data_types
Default data types
Type:
strDescription: Specify the default data types to use in generating code. If “Python”, Python’s semantics will be followed (i.e., float and int are represented using 64 bits). If the property is set to “C”, C’s semantcs will be used (float and int are represented using 32bits).
Default value:
Python
- compiler.extra_cmake_args
Additional CMake configuration arguments
Type:
strDescription: If set, specifies additional arguments to the initial invocation of
cmake.Default value: (Empty)
- compiler.format_code
Format code with clang-format
Type:
boolDescription: Formats the generated code with clang-format before saving the files.
Default value:
False
- compiler.format_config_file
Path the clang-format file
Type:
strDescription: Clang-format file to be used by clang-format, only used if format_code is true
Default value: (Empty)
- compiler.indentation_spaces
Indentation width
Type:
intDescription: Number of spaces used when indenting generated code.
Default value:
4
- compiler.inline_sdfgs
Inline all nested SDFGs
Type:
boolDescription: If set to true, inlines all nested SDFGs upon code generation by default.
Default value:
False
- compiler.library_extension
Library extension
Type:
strDescription: File extension of shared libraries.
Default value:
soDefault value (on Linux):
soDefault value (on Windows):
dllDefault value (on Darwin):
dylib
- compiler.library_prefix
Library prefix
Type:
strDescription: Filename prefix for shared libraries.
Default value: (Empty)
Default value (on Linux):
libDefault value (on Darwin):
lib
- compiler.max_stack_array_size
Max stack-allocated array size (bytes)
Type:
intDescription: All stack allocated arrays (i.e. StorageType.Register) with size larger than this will be allocated on the heap.
Default value:
65536
- compiler.unique_functions
Generate unique functions
Type:
strDescription: Determine if and how to generate the code for equivalent NestedSDFGs: “hash”: hashing is used to determine if multiple NestedSDFGs with equivalent contents exist. If this is the case, the code is generated only once. “unique_name”: the unique_name property of SDFG is used to determine if two NestedSDFGs are equal, generating the code only once. This gives more control to the programmer, that can explicitly decide what NestedSDFG code can be replicated and what not. “none”: a separate function is code generated for each NestedSDFG
Default value:
hash
- compiler.use_cache
Use cache
Type:
boolDescription: If enabled, does not recompile code generated from SDFGs if shared library (.so/.dll) file is present.
Default value:
False
compiler.cpu
CPU compiler preferences
- compiler.cpu.args
Arguments
Type:
strDescription: Compiler argument flags
Default value:
-std=c++14 -fPIC -Wall -Wextra -O3 -march=native -ffast-math -Wno-unused-parameter -Wno-unused-labelDefault value (on Windows):
/O2 /fp:fast /arch:AVX2 /D_USRDLL /D_WINDLL /D__restrict__=__restrict
- compiler.cpu.executable
Compiler executable override
Type:
strDescription: File path or name of compiler executable
Default value: (Empty)
- compiler.cpu.libs
Additional libraries
Type:
strDescription: Additional linked libraries required by target
Default value: (Empty)
- compiler.cpu.openmp_sections
Use OpenMP sections
Type:
boolDescription: If set to true, multiple connected components will generate “#pragma omp parallel sections” code around them.
Default value:
False
compiler.cuda
GPU (CUDA/HIP) compiler preferences
- compiler.cuda.args
nvcc Arguments
Type:
strDescription: Compiler argument flags for CUDA
Default value:
-Xcompiler -march=native --use_fast_math -Xcompiler -Wno-unused-parameterDefault value (on Windows):
-O3 --use_fast_math
- compiler.cuda.backend
Compilation backend
Type:
strDescription: Backend to compile for (‘auto’ for automatic detection, ‘cuda’ for NVIDIA, or ‘hip’ for AMD).
Default value:
auto
- compiler.cuda.block_size_lastdim_limit
Maximum last dimension thread-block size in code generation
Type:
intDescription: Threshold for the GPU code generator to fail in generating a kernel with a specified larger block size in the third dimension. Default value is derived from hardware limits on common GPUs.
Default value:
64
- compiler.cuda.block_size_limit
Maximum thread-block size in code generation
Type:
intDescription: Threshold for the GPU code generator to fail in generating a kernel with a specified overall larger block size. Default value is derived from hardware limits on common GPUs.
Default value:
1024
- compiler.cuda.cuda_arch
Additional CUDA architectures
Type:
strDescription: Additional CUDA architectures (separated by commas) to compile GPU code for, excluding the current architecture on the compiling machine.
Default value:
60
- compiler.cuda.default_block_size
Default thread-block size
Type:
strDescription: Default thread-block size for GPU kernels when explicit GPU block maps are not defined. Can be set to ‘max’ to maximize occupancy.
Default value:
32,1,1
- compiler.cuda.dynamic_map_block_size
Thread-Block size for GPU_ThreadBlock_Dynamic
Type:
strDescription: Thread-Block size for maps using GPU_ThreadBlock_Dynamic scheduler. Can be set to ‘max’ to maximize occupancy.
Default value:
128,1,1
- compiler.cuda.dynamic_map_fine_grained
Enable fine grained load balancing for GPU_ThreadBlock_Dynamic
Type:
boolDescription: If true the scheduler will dynamically redistribute the combined work of all threads in the warp equally across the warp (fine grained). Otherwise, each warp works sequentially only on its tasks (potential load imbalance).
Default value:
True
- compiler.cuda.hip_arch
Additional HIP architectures
Type:
strDescription: Additional HIP architectures (separated by commas) to compile GPU code for, excluding the current architecture on the compiling machine.
Default value:
gfx906
- compiler.cuda.hip_args
hipcc Arguments
Type:
strDescription: Compiler argument flags for HIP
Default value:
-std=c++17 -fPIC -O3 -ffast-math -Wno-unused-parameter
- compiler.cuda.libs
Additional libraries
Type:
strDescription: Additional linked libraries required by target
Default value: (Empty)
- compiler.cuda.max_concurrent_streams
Concurrent execution streams
Type:
intDescription: Maximum number of concurrent CUDA/HIP streams to generate. Special values: -1 only uses the default stream, 0 uses infinite concurrent streams.
Default value:
0
- compiler.cuda.mempool_release_threshold
Memory pool memory release threshold
Type:
intDescription: A value that determines how large a memory allocation has to be before it is automatically released from the memory pool to the system. The default is -1, which indicates “never release”. Other values may be 0 (always release), or any byte value. For more information, see
cudaMemPoolAttrReleaseThresholdin the CUDA toolkit documentation.Default value:
-1
- compiler.cuda.path
CUDA/HIP path override
Type:
strDescription: Path to CUDA toolkit or ROCm/HIP root directory
Default value: (Empty)
- compiler.cuda.persistent_map_SM_fraction
Fraction of SMs to use for persistent GPU map
Type:
floatDescription: Sets the fraction of the number of SMs of the Device that the GPU_Persistent map can use. Together with persistent_map_occupancy this specifies the grid size of the kernel being launched. 0.0 < persistent_map_SM_fraction <= 1.0 The fraction will be rounded up to the next integer number of SMs. The max value of SMs that can/will be used is equal to cudaDevAttrMultiProcessorCount.
Default value:
1.0
- compiler.cuda.persistent_map_occupancy
Number of blocks to launch per SM used
Type:
intDescription: Sets the number of thread block to be launched per SM being used. Essentially this is a simple multiplier to persistent_map_SM_fraction. It is up to the user to check if the resulting number of thread blocks can run efficiently on the GPU.
Default value:
2
- compiler.cuda.syncdebug
Synchronous Debugging
Type:
boolDescription: Enables Synchronous Debugging mode, where each library call is followed by full-device synchronization and error checking.
Default value:
False
- compiler.cuda.thread_id_type
Thread/block index data type
Type:
strDescription: Defines the data type for a thread and block index in the generated code. The type is based on the type-classes in
dace.dtypes. For example,uint64is equivalent todace.uint64. Change this setting when large index types are needed to address memory offsets that are beyond the 32-bit range, or to reduce memory usage.Default value:
int32
compiler.fpga
Common preferences for FPGA compilation.
- compiler.fpga.autobuild_bitstreams
Automatically build bitstreams
Type:
boolDescription: If set to true, CMake will automatically build missing bitstreams when running an FPGA program. This can take a very long time, and users might want to do this manually. If set to false, the program will optimistically assume that the bitstream is present in the build directory, and will crash if this is not the case.
Default value:
True
- compiler.fpga.concurrent_kernel_detection
Detect parts of an SDFG that can run in parallel
Type:
boolDescription: If set to false, DaCe will place each weakly connected component found in an SDFG state in a different Kernel/Processing Element. If true, a heuristic will further inspect each independent component for other parallelism opportunities (e.g., branches of the SDFG that can be executed in parallel), creating the corresponding kernels.
Default value:
False
- compiler.fpga.minimum_fifo_depth
Minimum depth of FIFOs
Type:
intDescription: Sets the minimum depth of any generated FIFO.
Default value: (Empty)
- compiler.fpga.vendor
FPGA vendor
Type:
strDescription: Target Xilinx (“xilinx”) or Intel (“intel_fpga”) FPGAs when generating code.
Default value:
xilinx
compiler.intel_fpga
Intel FPGA compiler preferences.
- compiler.intel_fpga.board
Target FPGA board
Type:
strDescription: FPGA board to compile for, obtain list by running
aoc --list-boards.Default value:
a10gx
- compiler.intel_fpga.enable_debugging
Enable debugging for hardware kernels
Type:
boolDescription: Injects debugging cores where available.
Default value:
False
- compiler.intel_fpga.host_flags
Host arguments
Type:
strDescription: Extra host compiler argument flags
Default value:
-Wno-unknown-pragmas
- compiler.intel_fpga.kernel_flags
Kernel flags
Type:
strDescription: High-level synthesis C++ flags
Default value:
-fp-relaxed -cl-no-signed-zeros -cl-fast-relaxed-math -cl-single-precision-constant -no-interleaving=default
- compiler.intel_fpga.mode
Compilation mode
Type:
strDescription: Target of FPGA kernel build (emulator/simulator/hardware).
Default value:
emulator
- compiler.intel_fpga.path
Intel FPGA OpenCL SDK installation override
Type:
strDescription: Path to specific Intel FPGA OpenCL SDK installation to use instead of just searching PATH and environment variables.
Default value: (Empty)
compiler.linker
Linker preferences
- compiler.linker.args
Arguments
Type:
strDescription: Linker argument flags
Default value:
-Wl,--disable-new-dtagsDefault value (on Darwin): (Empty)
Default value (on Windows): (Empty)
- compiler.linker.executable
Linker executable override
Type:
strDescription: File path or name of linker executable
Default value: (Empty)
compiler.mpi
MPI compiler preferences
- compiler.mpi.executable
Compiler executable override
Type:
strDescription: File path or name of compiler executable
Default value: (Empty)
compiler.rtl
RTL (SystemVerilog) compiler preferences
- compiler.rtl.verbose
Verbose Build & Execution Output
Type:
boolDescription: Output full build and execution (incl internal state) log.
Default value:
False
- compiler.rtl.verilator_enable_debug
Verilator Enable Debug
Type:
boolDescription: Enable/disable verbose internal state debug output.
Default value:
False
- compiler.rtl.verilator_flags
Additional Verilator Arguments
Type:
strDescription: Additional arguments feed to verilator.
Default value: (Empty)
- compiler.rtl.verilator_lint_warnings
Verilator Lint Warnings
Type:
boolDescription: Enable/Disable detailed SV lint checker output.
Default value:
True
compiler.xilinx
FPGA (Xilinx) compiler preferences
- compiler.xilinx.build_flags
Arguments
Type:
strDescription: Kernel build C++ flags
Default value: (Empty)
- compiler.xilinx.decouple_array_interfaces
Decouple array memory interfaces
Type:
boolDescription: If an array is both read and written, this option decouples its accesses, by creatin a memory interface for reading and one for writing. Note that this may hide potential Read-After-Write or Write-After-Read dependencies.
Default value:
False
- compiler.xilinx.enable_debugging
Enable debugging for hardware kernels
Type:
boolDescription: Injects debugging cores on the interfaces of the kernel, allowing fine-grained debugging of hardware runs at the cost of additional resources. This is always enabled for emulation runs.
Default value:
False
- compiler.xilinx.frequency
Target frequency for Xilinx kernels
Type:
strDescription: Target frequency, in MHz, when compiling kernels for Xilinx. Will not necessarily be achieved in practice. To enable multiple clocks, enter values in the format “clock_id:frequency”, with frequency being specified in MHz separated by an escaped bar, all enclosed in quotes. E.g. “0:250|1:500”.
Default value: (Empty)
- compiler.xilinx.host_flags
Host arguments
Type:
strDescription: Extra host compiler argument flags
Default value:
-Wno-unknown-pragmas -Wno-unused-label
- compiler.xilinx.mode
Compilation mode
Type:
strDescription: Target of FPGA kernel build (simulation/software_emulation/hardware_emulation/hardware)
Default value:
simulation
- compiler.xilinx.path
Vitis installation override
Type:
strDescription: Path to specific Vitis/SDx/SDAccel installation to use instead of just searching PATH and environment variables.
Default value: (Empty)
- compiler.xilinx.platform
Target platform for Xilinx
Type:
strDescription: Platform name of Vitis/SDx/SDAccel target.
Default value:
xilinx_u250_xdma_201830_2
- compiler.xilinx.synthesis_flags
Synthesis arguments
Type:
strDescription: High-level synthesis C++ flags
Default value:
-std=c++14
experimental
Experimental features
- experimental.check_race_conditions
Check race conditions
Type:
boolDescription: Check for potential race conditions during validation.
Default value:
False
- experimental.validate_undefs
Undefined Symbol Check
Type:
boolDescription: Check for undefined symbols in memlets during SDFG validation.
Default value:
False
frontend
Python frontend preferences
- frontend.avoid_wcr
Avoid using WCR for augmented assignments when possible
Type:
boolDescription: Perform a map-symbol-dependency check on the write-subsets of augmented assignments that appear inside Maps to avoid using WCR when possible. This feature works correctly only when there is a single augmented assignment for each data dimension inside a Map.
Default value:
False
- frontend.cache_size
Program cache size
Type:
intDescription: The number of compiled programs to cache (based on argument types, closure constants, and closure array types) to avoid reparsing/compiling when calling a @dace.program or method.
Default value:
32
- frontend.check_args
Check arguments on SDFG call
Type:
boolDescription: Perform an early type check on arguments passed to an SDFG when called directly (from
SDFG.__call__). Another type check is performed when calling compiled SDFGs.Default value:
False
- frontend.dont_fuse_callbacks
Do not fuse callbacks
Type:
boolDescription: Stricter mode of operation where callbacks into Python don’t participate in state fusion transformations.
Default value:
False
- frontend.implicit_recursion_depth
Auto-parsing recursion depth
Type:
intDescription: The maximum call-stack depth allowed when automatically parsing called dace functions or methods.
Default value:
64
- frontend.preprocessing_passes
Number of preprocessing passes on Python code
Type:
intDescription: Number of times to run the Python preprocessing passes (e.g., constant folding) on the input code. Set to zero to disable preprocessing optimizations, set to -1 to run until the code has not changed.
Default value:
5
- frontend.raise_nested_parsing_errors
Raise nested parsing errors
Type:
boolDescription: Raise all errors out of nested function parsing contexts instead of trying to create a callback implicitly.
Default value:
False
- frontend.typed_callbacks_only
Only allow typed callbacks
Type:
boolDescription: Stricter mode of operation where callbacks into Python must have explicit return value types in order to compile.
Default value:
False
- frontend.unroll_threshold
Automatic unroll loop size threshold
Type:
intDescription: Threshold for automatic loop unrolling of any generator (e.g., including
range) with a compile-time size. A value of -1 (default) means not to unroll any loop automatically, a value of 0 means unrolling every loop, and a value above zero sets a size threshold beyond which a constant-sized loop will not be automatically unrolled.Default value:
-1
- frontend.verbose_errors
Show preprocessed AST on parsing errors
Type:
boolDescription: Prints out the preprocessed unparsed AST in case of a parsing error.
Default value:
False
instrumentation
Instrumentation preferences
- instrumentation.print_fpga_runtime
Print FPGA runtime
Type:
boolDescription: Prints the runtime of instrumented FPGA kernel states to standard output.
Default value:
False
- instrumentation.report_each_invocation
Save report for each invocation
Type:
boolDescription: Save an instrumentation report file for each invocation of the SDFG, rather than one report that spans from SDFG initialization to finalization.
Default value:
True
instrumentation.papi
PAPI configuration
- instrumentation.papi.default_counters
Default PAPI counters
Type:
strDescription: Sets the default PAPI counter list, formatted as a Python list of strings.
Default value:
['PAPI_TOT_INS', 'PAPI_TOT_CYC', 'PAPI_L2_TCM', 'PAPI_L3_TCM']
- instrumentation.papi.overhead_compensation
Compensate Overhead
Type:
boolDescription: Subtracts the minimum measured overhead from every measurement.
Default value:
True
- instrumentation.papi.vectorization_analysis
Enable vectorization check
Type:
boolDescription: Enables analysis of gcc vectorization information. Only gcc/g++ is supported.
Default value:
False
library
Settings for handling the use of DaCe libraries.
library.blas
Built-in BLAS DaCe library.
- library.blas.default_implementation
Default implementation
Type:
strDescription: Default implementation for BLAS library nodes.
Default value:
pure
- library.blas.override
Force configured implementation
Type:
boolDescription: Force the default implementation, even if an implementation has been explicitly set on a node.
Default value:
False
library.blas.fpga
FPGA-specific BLAS options.
- library.blas.fpga.default_stream_depth
Default FPGA stream depth
Type:
intDescription: Default FPGA stream depth used in the BLAS library nodes and the corresponding streaming transformations
Default value:
32
library.lapack
Built-in LAPACK DaCe library.
- library.lapack.default_implementation
Default implementation
Type:
strDescription: Default implementation for LAPACK library nodes.
Default value:
OpenBLAS
- library.lapack.override
Force configured implementation
Type:
boolDescription: Force the default implementation, even if an implementation has been explicitly set on a node.
Default value:
False
library.linalg
Built-in NumPy linalg DaCe library.
- library.linalg.default_implementation
Default implementation
Type:
strDescription: Default implementation for linalg library nodes.
Default value:
OpenBLAS
- library.linalg.override
Force configured implementation
Type:
boolDescription: Force the default implementation, even if an implementation has been explicitly set on a node.
Default value:
False
library.pblas
Built-in PBLAS DaCe library.
- library.pblas.default_implementation
Default implementation
Type:
strDescription: Default implementation PBLAS library nodes.
Default value:
MKLMPICH
- library.pblas.override
Force configured implementation
Type:
boolDescription: Force the default implementation, even if an implementation has been explicitly set on a node.
Default value:
False
optimizer
Preferences of the SDFG Optimizer
- optimizer.automatic_simplification
Automatic SDFG simplification
Type:
boolDescription: Automatically performs SDFG simplification on programs.
Default value:
True
- optimizer.autooptimize
Run auto-optimization heuristics
Type:
boolDescription: Automatically runs the set of optimizing transformation heuristics on any program called via the Python frontend.
Default value:
False
- optimizer.autospecialize
Auto-specialize symbols
Type:
boolDescription: Automatically specialize every SDFG to the symbol values at call-time. Requires all symbols to be set.
Default value:
False
- optimizer.autotile_partial_parallelism
Prefer partial parallelism over write-conflict tiling
Type:
boolDescription: If true, sets the auto-optimizer to prefer extracting map parallel dimensions over tiling for atomic write-conflict resolution edges. This may be slower in case of small parallel dimensions vs. conflicted dimensions. This preference only applies to symbolic ranges or ranges over the autotile_size parameter.
Default value:
True
- optimizer.autotile_size
Default tile size in auto-optimization
Type:
intDescription: Sets the default tile size for the optimization heuristics.
Default value:
128
- optimizer.detect_control_flow
Detect control flow from state transitions
Type:
boolDescription: Attempts to infer control flow constructs “if”, “for” and “while” from state transitions, allowing code generators to generate appropriate code.
Default value:
True
- optimizer.match_exception
Treat exceptions in “can_be_applied” as errors
Type:
boolDescription: When an exception is raised in a transformation “can_be_applied” function, if True the exception is raised further. Otherwise the exception is printed as a warning.
Default value:
False
- optimizer.save_intermediate
Save intermediate SDFGs
Type:
boolDescription: Save SDFG files after every transformation.
Default value:
False
- optimizer.symbolic_positive
Treat all symbolic expressions as positive
Type:
boolDescription: Every expression in which a symbolic value appears is treated as strictly positive. This is necessary for certain Range evaluations using Subgraph Fusion.
Default value:
True
- optimizer.visualize_sdfv
Visualize SDFG
Type:
boolDescription: Open an SDFG in browser every transformation.
Default value:
False
testing
Unit testing settings
- testing.deserialize_exception
Treat exceptions in deserialization as errors
Type:
boolDescription: When an exception is raised in a deserialization process (e.g., due to missing library node), by default a warning is issued. If this setting is True, the exception will be raised as-is.
Default value:
False
- testing.serialization
Test Serialization on validation
Type:
boolDescription: Before generating code, verify that a serialization/deserialization loop generates the same SDFG.
Default value:
False
- testing.serialize_all_fields
Serialize all unmodified fields in SDFG files
Type:
boolDescription: If False (default), saving an SDFG keeps only the modified non-default properties. If True, saves all fields.
Default value:
False