Configuration entry reference
The following configuration entries are available in .dace.conf
, as part of the API, and as environment variables.
See Configuring DaCe for more information on how to use the interface.
- cache
Compiled cache entry naming policy
Type:
str
Description: Determine the name of the generated
.dacecache
folder:
name
uses the name of the SDFG directly, causing it to be overridden by other programs using the same SDFG name.
hash
uses a mangled name based on the hash of the SDFG, such that any change to the SDFG will generate a different cache folder.
unique
uses a name based on the currently running Python process at code generation time, such that no caching or clashes can happen between different processes or subsequent invocations of Python.
single
uses a single cache folder for all SDFGs, saving space and potentially build time, but disallows executing SDFGs in parallel and caching of more than one simultaneous SDFG.Default value:
name
- call_hooks
Hooks before/after every DaCe program call
Type:
str
Description: A comma-separated list of functions (or Context Manager classes) that will be called before every DaCe program (SDFG) is compiled and run. Used for functionality such as automatic tuning or instrumentation.
Default value: (Empty)
- compiled_sdfg_call_hooks
Hooks before/after every compiled SDFG call
Type:
str
Description: A comma-separated list of functions (or Context Manager classes) that will be called before every compiled SDFG’s generated code is invoked. Used for functionality such as low-level profiling.
Default value: (Empty)
- debugprint
Debug printing
Type:
bool
Description: Enable verbose printouts.
Default value:
False
- default_build_folder
Default SDFG build folder
Type:
str
Description: Default folder in which compiled DaCe programs and SDFGs are stored. Can either be a relative path (by default) or absolute.
Default value:
.dacecache
- profiling
Profiling
Type:
bool
Description: Enable profiling support.
Default value:
False
- profiling_status
Status bar for profiling
Type:
bool
Description: Enable tqdm status bar while profiling. If tqdm is not installed a warning will appear. To disable this feature (and the warning) set this option to false.
Default value:
True
- progress
Progress reports
Type:
bool
Description: Enable progress report printouts.
Default value:
True
- store_history
Store SDFG transformation history
Type:
bool
Description: Store the history of transformations on the SDFG file.
Default value:
True
- treps
Profiling Repetitions
Type:
int
Description: Number of times to run program for profiling.
Default value:
100
compiler
Preferences of the compiler
- compiler.allow_shadowing
Allow variable shadowing
Type:
bool
Description: Allowing shadowing of variables in the code (reduces exceptions to warnings when shadowing is encountered).
Default value:
True
- compiler.allow_view_arguments
Allow numpy views as arguments
Type:
bool
Description: If true, allows users to call DaCe programs with NumPy views (for example, “A[:, 1]” or “w.T”). As this can create pointer aliasing issues with two arrays pointing to the same memory, or analyzability issue with strides and alignment, this option is disabled by default.
Default value:
False
- compiler.build_type
Build configuration
Type:
str
Description: Configuration type for CMake build (can be Debug, Release, RelWithDebInfo, or MinSizeRel).
Default value:
RelWithDebInfo
- compiler.codegen_lineinfo
Annotate code generator lines
Type:
bool
Description: Keep a source mapping between generated code and the file/line of the code generator that generated it. Used for debugging code generation.
Default value:
False
- compiler.codegen_state_struct_suffix
Suffix used by the code generator to mangle the state struct.
Type:
str
Description: For every SDFG the code generator is is processing a state struct is generated. The typename of this struct is derived by appending this value to the SDFG’s name. Note that the suffix may only contains letters, digits and underscores.
Default value:
_state_t
- compiler.default_data_types
Default data types
Type:
str
Description: Specify the default data types to use in generating code. If “Python”, Python’s semantics will be followed (i.e., float and int are represented using 64 bits). If the property is set to “C”, C’s semantcs will be used (float and int are represented using 32bits).
Default value:
Python
- compiler.extra_cmake_args
Additional CMake configuration arguments
Type:
str
Description: If set, specifies additional arguments to the initial invocation of
cmake
.Default value: (Empty)
- compiler.indentation_spaces
Indentation width
Type:
int
Description: Number of spaces used when indenting generated code.
Default value:
4
- compiler.inline_sdfgs
Inline all nested SDFGs
Type:
bool
Description: If set to true, inlines all nested SDFGs upon code generation by default.
Default value:
False
- compiler.library_extension
Library extension
Type:
str
Description: File extension of shared libraries.
Default value:
so
Default value (on Linux):
so
Default value (on Windows):
dll
Default value (on Darwin):
dylib
- compiler.library_prefix
Library prefix
Type:
str
Description: Filename prefix for shared libraries.
Default value: (Empty)
Default value (on Linux):
lib
Default value (on Darwin):
lib
- compiler.max_stack_array_size
Max stack-allocated array size (bytes)
Type:
int
Description: All stack allocated arrays (i.e. StorageType.Register) with size larger than this will be allocated on the heap.
Default value:
65536
- compiler.unique_functions
Generate unique functions
Type:
str
Description: Determine if and how to generate the code for equivalent NestedSDFGs: “hash”: hashing is used to determine if multiple NestedSDFGs with equivalent contents exist. If this is the case, the code is generated only once. “unique_name”: the unique_name property of SDFG is used to determine if two NestedSDFGs are equal, generating the code only once. This gives more control to the programmer, that can explicitly decide what NestedSDFG code can be replicated and what not. “none”: a separate function is code generated for each NestedSDFG
Default value:
hash
- compiler.use_cache
Use cache
Type:
bool
Description: If enabled, does not recompile code generated from SDFGs if shared library (.so/.dll) file is present.
Default value:
False
compiler.cpu
CPU compiler preferences
- compiler.cpu.args
Arguments
Type:
str
Description: Compiler argument flags
Default value:
-std=c++14 -fPIC -Wall -Wextra -O3 -march=native -ffast-math -Wno-unused-parameter -Wno-unused-label
Default value (on Windows):
/O2 /fp:fast /arch:AVX2 /D_USRDLL /D_WINDLL /D__restrict__=__restrict
- compiler.cpu.executable
Compiler executable override
Type:
str
Description: File path or name of compiler executable
Default value: (Empty)
- compiler.cpu.libs
Additional libraries
Type:
str
Description: Additional linked libraries required by target
Default value: (Empty)
- compiler.cpu.openmp_sections
Use OpenMP sections
Type:
bool
Description: If set to true, multiple connected components will generate “#pragma omp parallel sections” code around them.
Default value:
False
compiler.cuda
GPU (CUDA/HIP) compiler preferences
- compiler.cuda.args
nvcc Arguments
Type:
str
Description: Compiler argument flags for CUDA
Default value:
-Xcompiler -march=native --use_fast_math -Xcompiler -Wno-unused-parameter
Default value (on Windows):
-O3 --use_fast_math
- compiler.cuda.backend
Compilation backend
Type:
str
Description: Backend to compile for (‘auto’ for automatic detection, ‘cuda’ for NVIDIA, or ‘hip’ for AMD).
Default value:
auto
- compiler.cuda.block_size_lastdim_limit
Maximum last dimension thread-block size in code generation
Type:
int
Description: Threshold for the GPU code generator to fail in generating a kernel with a specified larger block size in the third dimension. Default value is derived from hardware limits on common GPUs.
Default value:
64
- compiler.cuda.block_size_limit
Maximum thread-block size in code generation
Type:
int
Description: Threshold for the GPU code generator to fail in generating a kernel with a specified overall larger block size. Default value is derived from hardware limits on common GPUs.
Default value:
1024
- compiler.cuda.cuda_arch
Additional CUDA architectures
Type:
str
Description: Additional CUDA architectures (separated by commas) to compile GPU code for, excluding the current architecture on the compiling machine.
Default value:
60
- compiler.cuda.default_block_size
Default thread-block size
Type:
str
Description: Default thread-block size for GPU kernels when explicit GPU block maps are not defined. Can be set to ‘max’ to maximize occupancy.
Default value:
32,1,1
- compiler.cuda.dynamic_map_block_size
Thread-Block size for GPU_ThreadBlock_Dynamic
Type:
str
Description: Thread-Block size for maps using GPU_ThreadBlock_Dynamic scheduler. Can be set to ‘max’ to maximize occupancy.
Default value:
128,1,1
- compiler.cuda.dynamic_map_fine_grained
Enable fine grained load balancing for GPU_ThreadBlock_Dynamic
Type:
bool
Description: If true the scheduler will dynamically redistribute the combined work of all threads in the warp equally across the warp (fine grained). Otherwise, each warp works sequentially only on its tasks (potential load imbalance).
Default value:
True
- compiler.cuda.hip_arch
Additional HIP architectures
Type:
str
Description: Additional HIP architectures (separated by commas) to compile GPU code for, excluding the current architecture on the compiling machine.
Default value:
gfx906
- compiler.cuda.hip_args
hipcc Arguments
Type:
str
Description: Compiler argument flags for HIP
Default value:
-std=c++17 -fPIC -O3 -ffast-math -Wno-unused-parameter
- compiler.cuda.libs
Additional libraries
Type:
str
Description: Additional linked libraries required by target
Default value: (Empty)
- compiler.cuda.max_concurrent_streams
Concurrent execution streams
Type:
int
Description: Maximum number of concurrent CUDA/HIP streams to generate. Special values: -1 only uses the default stream, 0 uses infinite concurrent streams.
Default value:
0
- compiler.cuda.mempool_release_threshold
Memory pool memory release threshold
Type:
int
Description: A value that determines how large a memory allocation has to be before it is automatically released from the memory pool to the system. The default is -1, which indicates “never release”. Other values may be 0 (always release), or any byte value. For more information, see
cudaMemPoolAttrReleaseThreshold
in the CUDA toolkit documentation.Default value:
-1
- compiler.cuda.path
CUDA/HIP path override
Type:
str
Description: Path to CUDA toolkit or ROCm/HIP root directory
Default value: (Empty)
- compiler.cuda.persistent_map_SM_fraction
Fraction of SMs to use for persistent GPU map
Type:
float
Description: Sets the fraction of the number of SMs of the Device that the GPU_Persistent map can use. Together with persistent_map_occupancy this specifies the grid size of the kernel being launched. 0.0 < persistent_map_SM_fraction <= 1.0 The fraction will be rounded up to the next integer number of SMs. The max value of SMs that can/will be used is equal to cudaDevAttrMultiProcessorCount.
Default value:
1.0
- compiler.cuda.persistent_map_occupancy
Number of blocks to launch per SM used
Type:
int
Description: Sets the number of thread block to be launched per SM being used. Essentially this is a simple multiplier to persistent_map_SM_fraction. It is up to the user to check if the resulting number of thread blocks can run efficiently on the GPU.
Default value:
2
- compiler.cuda.syncdebug
Synchronous Debugging
Type:
bool
Description: Enables Synchronous Debugging mode, where each library call is followed by full-device synchronization and error checking.
Default value:
False
- compiler.cuda.thread_id_type
Thread/block index data type
Type:
str
Description: Defines the data type for a thread and block index in the generated code. The type is based on the type-classes in
dace.dtypes
. For example,uint64
is equivalent todace.uint64
. Change this setting when large index types are needed to address memory offsets that are beyond the 32-bit range, or to reduce memory usage.Default value:
int32
compiler.fpga
Common preferences for FPGA compilation.
- compiler.fpga.autobuild_bitstreams
Automatically build bitstreams
Type:
bool
Description: If set to true, CMake will automatically build missing bitstreams when running an FPGA program. This can take a very long time, and users might want to do this manually. If set to false, the program will optimistically assume that the bitstream is present in the build directory, and will crash if this is not the case.
Default value:
True
- compiler.fpga.concurrent_kernel_detection
Detect parts of an SDFG that can run in parallel
Type:
bool
Description: If set to false, DaCe will place each weakly connected component found in an SDFG state in a different Kernel/Processing Element. If true, a heuristic will further inspect each independent component for other parallelism opportunities (e.g., branches of the SDFG that can be executed in parallel), creating the corresponding kernels.
Default value:
False
- compiler.fpga.minimum_fifo_depth
Minimum depth of FIFOs
Type:
int
Description: Sets the minimum depth of any generated FIFO.
Default value: (Empty)
- compiler.fpga.vendor
FPGA vendor
Type:
str
Description: Target Xilinx (“xilinx”) or Intel (“intel_fpga”) FPGAs when generating code.
Default value:
xilinx
compiler.intel_fpga
Intel FPGA compiler preferences.
- compiler.intel_fpga.board
Target FPGA board
Type:
str
Description: FPGA board to compile for, obtain list by running
aoc --list-boards
.Default value:
a10gx
- compiler.intel_fpga.enable_debugging
Enable debugging for hardware kernels
Type:
bool
Description: Injects debugging cores where available.
Default value:
False
- compiler.intel_fpga.host_flags
Host arguments
Type:
str
Description: Extra host compiler argument flags
Default value:
-Wno-unknown-pragmas
- compiler.intel_fpga.kernel_flags
Kernel flags
Type:
str
Description: High-level synthesis C++ flags
Default value:
-fp-relaxed -cl-no-signed-zeros -cl-fast-relaxed-math -cl-single-precision-constant -no-interleaving=default
- compiler.intel_fpga.mode
Compilation mode
Type:
str
Description: Target of FPGA kernel build (emulator/simulator/hardware).
Default value:
emulator
- compiler.intel_fpga.path
Intel FPGA OpenCL SDK installation override
Type:
str
Description: Path to specific Intel FPGA OpenCL SDK installation to use instead of just searching PATH and environment variables.
Default value: (Empty)
compiler.linker
Linker preferences
- compiler.linker.args
Arguments
Type:
str
Description: Linker argument flags
Default value:
-Wl,--disable-new-dtags
Default value (on Darwin): (Empty)
Default value (on Windows): (Empty)
- compiler.linker.executable
Linker executable override
Type:
str
Description: File path or name of linker executable
Default value: (Empty)
compiler.mpi
MPI compiler preferences
- compiler.mpi.executable
Compiler executable override
Type:
str
Description: File path or name of compiler executable
Default value: (Empty)
compiler.rtl
RTL (SystemVerilog) compiler preferences
- compiler.rtl.verbose
Verbose Build & Execution Output
Type:
bool
Description: Output full build and execution (incl internal state) log.
Default value:
False
- compiler.rtl.verilator_enable_debug
Verilator Enable Debug
Type:
bool
Description: Enable/disable verbose internal state debug output.
Default value:
False
- compiler.rtl.verilator_flags
Additional Verilator Arguments
Type:
str
Description: Additional arguments feed to verilator.
Default value: (Empty)
- compiler.rtl.verilator_lint_warnings
Verilator Lint Warnings
Type:
bool
Description: Enable/Disable detailed SV lint checker output.
Default value:
True
compiler.xilinx
FPGA (Xilinx) compiler preferences
- compiler.xilinx.build_flags
Arguments
Type:
str
Description: Kernel build C++ flags
Default value: (Empty)
- compiler.xilinx.decouple_array_interfaces
Decouple array memory interfaces
Type:
bool
Description: If an array is both read and written, this option decouples its accesses, by creatin a memory interface for reading and one for writing. Note that this may hide potential Read-After-Write or Write-After-Read dependencies.
Default value:
False
- compiler.xilinx.enable_debugging
Enable debugging for hardware kernels
Type:
bool
Description: Injects debugging cores on the interfaces of the kernel, allowing fine-grained debugging of hardware runs at the cost of additional resources. This is always enabled for emulation runs.
Default value:
False
- compiler.xilinx.frequency
Target frequency for Xilinx kernels
Type:
str
Description: Target frequency, in MHz, when compiling kernels for Xilinx. Will not necessarily be achieved in practice. To enable multiple clocks, enter values in the format “clock_id:frequency”, with frequency being specified in MHz separated by an escaped bar, all enclosed in quotes. E.g. “0:250|1:500”.
Default value: (Empty)
- compiler.xilinx.host_flags
Host arguments
Type:
str
Description: Extra host compiler argument flags
Default value:
-Wno-unknown-pragmas -Wno-unused-label
- compiler.xilinx.mode
Compilation mode
Type:
str
Description: Target of FPGA kernel build (simulation/software_emulation/hardware_emulation/hardware)
Default value:
simulation
- compiler.xilinx.path
Vitis installation override
Type:
str
Description: Path to specific Vitis/SDx/SDAccel installation to use instead of just searching PATH and environment variables.
Default value: (Empty)
- compiler.xilinx.platform
Target platform for Xilinx
Type:
str
Description: Platform name of Vitis/SDx/SDAccel target.
Default value:
xilinx_u250_xdma_201830_2
- compiler.xilinx.synthesis_flags
Synthesis arguments
Type:
str
Description: High-level synthesis C++ flags
Default value:
-std=c++14
experimental
Experimental features
- experimental.validate_undefs
Undefined Symbol Check
Type:
bool
Description: Check for undefined symbols in memlets during SDFG validation.
Default value:
False
frontend
Python frontend preferences
- frontend.avoid_wcr
Avoid using WCR for augmented assignments when possible
Type:
bool
Description: Perform a map-symbol-dependency check on the write-subsets of augmented assignments that appear inside Maps to avoid using WCR when possible. This feature works correctly only when there is a single augmented assignment for each data dimension inside a Map.
Default value:
False
- frontend.cache_size
Program cache size
Type:
int
Description: The number of compiled programs to cache (based on argument types, closure constants, and closure array types) to avoid reparsing/compiling when calling a @dace.program or method.
Default value:
32
- frontend.check_args
Check arguments on SDFG call
Type:
bool
Description: Perform an early type check on arguments passed to an SDFG when called directly (from
SDFG.__call__
). Another type check is performed when calling compiled SDFGs.Default value:
False
- frontend.dont_fuse_callbacks
Do not fuse callbacks
Type:
bool
Description: Stricter mode of operation where callbacks into Python don’t participate in state fusion transformations.
Default value:
False
- frontend.implicit_recursion_depth
Auto-parsing recursion depth
Type:
int
Description: The maximum call-stack depth allowed when automatically parsing called dace functions or methods.
Default value:
64
- frontend.preprocessing_passes
Number of preprocessing passes on Python code
Type:
int
Description: Number of times to run the Python preprocessing passes (e.g., constant folding) on the input code. Set to zero to disable preprocessing optimizations, set to -1 to run until the code has not changed.
Default value:
5
- frontend.raise_nested_parsing_errors
Raise nested parsing errors
Type:
bool
Description: Raise all errors out of nested function parsing contexts instead of trying to create a callback implicitly.
Default value:
False
- frontend.typed_callbacks_only
Only allow typed callbacks
Type:
bool
Description: Stricter mode of operation where callbacks into Python must have explicit return value types in order to compile.
Default value:
False
- frontend.unroll_threshold
Automatic unroll loop size threshold
Type:
int
Description: Threshold for automatic loop unrolling of any generator (e.g., including
range
) with a compile-time size. A value of -1 (default) means not to unroll any loop automatically, a value of 0 means unrolling every loop, and a value above zero sets a size threshold beyond which a constant-sized loop will not be automatically unrolled.Default value:
-1
- frontend.verbose_errors
Show preprocessed AST on parsing errors
Type:
bool
Description: Prints out the preprocessed unparsed AST in case of a parsing error.
Default value:
False
instrumentation
Instrumentation preferences
- instrumentation.print_fpga_runtime
Print FPGA runtime
Type:
bool
Description: Prints the runtime of instrumented FPGA kernel states to standard output.
Default value:
False
- instrumentation.report_each_invocation
Save report for each invocation
Type:
bool
Description: Save an instrumentation report file for each invocation of the SDFG, rather than one report that spans from SDFG initialization to finalization.
Default value:
True
instrumentation.papi
PAPI configuration
- instrumentation.papi.default_counters
Default PAPI counters
Type:
str
Description: Sets the default PAPI counter list, formatted as a Python list of strings.
Default value:
['PAPI_TOT_INS', 'PAPI_TOT_CYC', 'PAPI_L2_TCM', 'PAPI_L3_TCM']
- instrumentation.papi.overhead_compensation
Compensate Overhead
Type:
bool
Description: Subtracts the minimum measured overhead from every measurement.
Default value:
True
- instrumentation.papi.vectorization_analysis
Enable vectorization check
Type:
bool
Description: Enables analysis of gcc vectorization information. Only gcc/g++ is supported.
Default value:
False
library
Settings for handling the use of DaCe libraries.
library.blas
Built-in BLAS DaCe library.
- library.blas.default_implementation
Default implementation
Type:
str
Description: Default implementation for BLAS library nodes.
Default value:
pure
- library.blas.override
Force configured implementation
Type:
bool
Description: Force the default implementation, even if an implementation has been explicitly set on a node.
Default value:
False
library.blas.fpga
FPGA-specific BLAS options.
- library.blas.fpga.default_stream_depth
Default FPGA stream depth
Type:
int
Description: Default FPGA stream depth used in the BLAS library nodes and the corresponding streaming transformations
Default value:
32
library.lapack
Built-in LAPACK DaCe library.
- library.lapack.default_implementation
Default implementation
Type:
str
Description: Default implementation for LAPACK library nodes.
Default value:
OpenBLAS
- library.lapack.override
Force configured implementation
Type:
bool
Description: Force the default implementation, even if an implementation has been explicitly set on a node.
Default value:
False
library.linalg
Built-in NumPy linalg DaCe library.
- library.linalg.default_implementation
Default implementation
Type:
str
Description: Default implementation for linalg library nodes.
Default value:
OpenBLAS
- library.linalg.override
Force configured implementation
Type:
bool
Description: Force the default implementation, even if an implementation has been explicitly set on a node.
Default value:
False
library.pblas
Built-in PBLAS DaCe library.
- library.pblas.default_implementation
Default implementation
Type:
str
Description: Default implementation PBLAS library nodes.
Default value:
MKLMPICH
- library.pblas.override
Force configured implementation
Type:
bool
Description: Force the default implementation, even if an implementation has been explicitly set on a node.
Default value:
False
optimizer
Preferences of the SDFG Optimizer
- optimizer.automatic_simplification
Automatic SDFG simplification
Type:
bool
Description: Automatically performs SDFG simplification on programs.
Default value:
True
- optimizer.autooptimize
Run auto-optimization heuristics
Type:
bool
Description: Automatically runs the set of optimizing transformation heuristics on any program called via the Python frontend.
Default value:
False
- optimizer.autospecialize
Auto-specialize symbols
Type:
bool
Description: Automatically specialize every SDFG to the symbol values at call-time. Requires all symbols to be set.
Default value:
False
- optimizer.autotile_partial_parallelism
Prefer partial parallelism over write-conflict tiling
Type:
bool
Description: If true, sets the auto-optimizer to prefer extracting map parallel dimensions over tiling for atomic write-conflict resolution edges. This may be slower in case of small parallel dimensions vs. conflicted dimensions. This preference only applies to symbolic ranges or ranges over the autotile_size parameter.
Default value:
True
- optimizer.autotile_size
Default tile size in auto-optimization
Type:
int
Description: Sets the default tile size for the optimization heuristics.
Default value:
128
- optimizer.detect_control_flow
Detect control flow from state transitions
Type:
bool
Description: Attempts to infer control flow constructs “if”, “for” and “while” from state transitions, allowing code generators to generate appropriate code.
Default value:
True
- optimizer.match_exception
Treat exceptions in “can_be_applied” as errors
Type:
bool
Description: When an exception is raised in a transformation “can_be_applied” function, if True the exception is raised further. Otherwise the exception is printed as a warning.
Default value:
False
- optimizer.save_intermediate
Save intermediate SDFGs
Type:
bool
Description: Save SDFG files after every transformation.
Default value:
False
- optimizer.symbolic_positive
Treat all symbolic expressions as positive
Type:
bool
Description: Every expression in which a symbolic value appears is treated as strictly positive. This is necessary for certain Range evaluations using Subgraph Fusion.
Default value:
True
- optimizer.visualize_sdfv
Visualize SDFG
Type:
bool
Description: Open an SDFG in browser every transformation.
Default value:
False
testing
Unit testing settings
- testing.deserialize_exception
Treat exceptions in deserialization as errors
Type:
bool
Description: When an exception is raised in a deserialization process (e.g., due to missing library node), by default a warning is issued. If this setting is True, the exception will be raised as-is.
Default value:
False
- testing.serialization
Test Serialization on validation
Type:
bool
Description: Before generating code, verify that a serialization/deserialization loop generates the same SDFG.
Default value:
False
- testing.serialize_all_fields
Serialize all unmodified fields in SDFG files
Type:
bool
Description: If False (default), saving an SDFG keeps only the modified non-default properties. If True, saves all fields.
Default value:
False