Configuration entry reference

The following configuration entries are available in .dace.conf, as part of the API, and as environment variables. See Configuring DaCe for more information on how to use the interface.

cache

Compiled cache entry naming policy

Type: str

Description: Determine the name of the generated .dacecache folder:

name uses the name of the SDFG directly, causing it to be overridden by other programs using the same SDFG name.

hash uses a mangled name based on the hash of the SDFG, such that any change to the SDFG will generate a different cache folder.

unique uses a name based on the currently running Python process at code generation time, such that no caching or clashes can happen between different processes or subsequent invocations of Python.

single uses a single cache folder for all SDFGs, saving space and potentially build time, but disallows executing SDFGs in parallel and caching of more than one simultaneous SDFG.

Default value: name

call_hooks

Hooks before/after every DaCe program call

Type: str

Description: A comma-separated list of functions (or Context Manager classes) that will be called before every DaCe program (SDFG) is compiled and run. Used for functionality such as automatic tuning or instrumentation.

Default value: (Empty)

compiled_sdfg_call_hooks

Hooks before/after every compiled SDFG call

Type: str

Description: A comma-separated list of functions (or Context Manager classes) that will be called before every compiled SDFG’s generated code is invoked. Used for functionality such as low-level profiling.

Default value: (Empty)

debugprint

Debug printing

Type: bool

Description: Enable verbose printouts.

Default value: False

default_build_folder

Default SDFG build folder

Type: str

Description: Default folder in which compiled DaCe programs and SDFGs are stored. Can either be a relative path (by default) or absolute.

Default value: .dacecache

external_transformations_path

External transformations path

Type: str

Description: Path to a directory containing external transformations that are not included in the main DaCe package. This path is added to the Python path and can be used to import custom transformation modules.

Default value: $HOME/dace_transformations/external_transformations

Default value (on Windows): %USERPROFILE%\\dace_transformations\\external_transformations

profiling

Profiling

Type: bool

Description: Enable profiling support.

Default value: False

profiling_status

Status bar for profiling

Type: bool

Description: Enable tqdm status bar while profiling. If tqdm is not installed a warning will appear. To disable this feature (and the warning) set this option to false.

Default value: True

progress

Progress reports

Type: bool

Description: Enable progress report printouts.

Default value: True

store_history

Store SDFG transformation history

Type: bool

Description: Store the history of transformations on the SDFG file.

Default value: True

treps

Profiling Repetitions

Type: int

Description: Number of times to run program for profiling.

Default value: 100

compiler

Preferences of the compiler

compiler.allow_shadowing

Allow variable shadowing

Type: bool

Description: Allowing shadowing of variables in the code (reduces exceptions to warnings when shadowing is encountered).

Default value: True

compiler.allow_view_arguments

Allow numpy views as arguments

Type: bool

Description: If true, allows users to call DaCe programs with NumPy views (for example, “A[:, 1]” or “w.T”). As this can create pointer aliasing issues with two arrays pointing to the same memory, or analyzability issue with strides and alignment, this option is disabled by default.

Default value: False

compiler.build_folder_mode

Save mode for the build folder

Type: str

Description: Selects which content should be saved in the build folder. Two modes are currently supported: development, that includes everything; and production, that saves only the compiled library and the folder mode file.

Default value: development

compiler.build_type

Build configuration

Type: str

Description: Configuration type for CMake build (can be Debug, Release, RelWithDebInfo, or MinSizeRel).

Default value: RelWithDebInfo

compiler.codegen_lineinfo

Annotate code generator lines

Type: bool

Description: Keep a source mapping between generated code and the file/line of the code generator that generated it. Used for debugging code generation.

Default value: False

compiler.codegen_state_struct_suffix

Suffix used by the code generator to mangle the state struct.

Type: str

Description: For every SDFG the code generator is is processing a state struct is generated. The typename of this struct is derived by appending this value to the SDFG’s name. Note that the suffix may only contains letters, digits and underscores.

Default value: _state_t

compiler.cpp_standard

C++ standard version

Type: str

Description: C++ standard to use for compilation (e.g., 14, 17, 20, 23, 26).

Default value: 20

compiler.default_data_types

Default data types

Type: str

Description: Specify the default data types to use in generating code. If “Python”, Python’s semantics will be followed (i.e., float and int are represented using 64 bits). If the property is set to “C”, C’s semantics will be used (float and int are represented using 32bits).

Default value: Python

compiler.extra_cmake_args

Additional CMake configuration arguments

Type: str

Description: If set, specifies additional arguments to the initial invocation of cmake.

Default value: (Empty)

compiler.format_code

Format code with clang-format

Type: bool

Description: Formats the generated code with clang-format before saving the files.

Default value: False

compiler.format_config_file

Path to the .clang-format file

Type: str

Description: Clang-format file to be used by clang-format, only used if format_code is true

Default value: (Empty)

compiler.indentation_spaces

Indentation width

Type: int

Description: Number of spaces used when indenting generated code.

Default value: 4

compiler.inline_sdfgs

Inline all nested SDFGs

Type: bool

Description: If set to true, inlines all nested SDFGs upon code generation by default.

Default value: False

compiler.library_extension

Library extension

Type: str

Description: File extension of shared libraries.

Default value: so

Default value (on Linux): so

Default value (on Windows): dll

Default value (on Darwin): dylib

compiler.library_prefix

Library prefix

Type: str

Description: Filename prefix for shared libraries.

Default value: (Empty)

Default value (on Linux): lib

Default value (on Darwin): lib

compiler.lineinfo

Add line info

Type: str

Description: Wether or not to add line info from the parsed code in the generated SDFG. Valid options are inspect and none. “inspect”: During parsing, inspect the python call stack and automatically add line info from the parsed source code in the resulting SDFG. “none”: Do not save any line info in the resulting SDFG.

Default value: inspect

compiler.max_stack_array_size

Max stack-allocated array size (bytes)

Type: int

Description: All stack allocated arrays (i.e. StorageType.Register) with size larger than this will be allocated on the heap.

Default value: 65536

compiler.unique_functions

Generate unique functions

Type: str

Description: Determine if and how to generate the code for equivalent NestedSDFGs: “hash”: hashing is used to determine if multiple NestedSDFGs with equivalent contents exist. If this is the case, the code is generated only once. “unique_name”: the unique_name property of SDFG is used to determine if two NestedSDFGs are equal, generating the code only once. This gives more control to the programmer, that can explicitly decide what NestedSDFG code can be replicated and what not. “none”: a separate function is code generated for each NestedSDFG

Default value: hash

compiler.use_cache

Use cache

Type: bool

Description: If enabled, does not recompile code generated from SDFGs if shared library (.so/.dll) file is present.

Default value: False

compiler.cpu

CPU compiler preferences

compiler.cpu.args

Arguments

Type: str

Description: Compiler argument flags

Default value: -fPIC -Wall -Wextra -O3 -march=native -ffast-math -Wno-unused-parameter -Wno-unused-label

Default value (on Windows): /O2 /fp:fast /arch:AVX2 /D_USRDLL /D_WINDLL /D__restrict__=__restrict

compiler.cpu.executable

Compiler executable override

Type: str

Description: File path or name of compiler executable

Default value: (Empty)

compiler.cpu.libs

Additional libraries

Type: str

Description: Additional linked libraries required by target

Default value: (Empty)

compiler.cpu.openmp_sections

Use OpenMP sections

Type: bool

Description: If set to true, multiple connected components will generate “#pragma omp parallel sections” code around them.

Default value: False

compiler.cuda

GPU (CUDA/HIP) compiler preferences

compiler.cuda.allow_implicit_memlet_to_map

Allow the implicit conversion of Memlets to Maps during code generation.

Type: bool

Description: If true the code generator will implicitly convert Memlets that cannot be represented by a native library call, such as cudaMemcpy() into Maps that explicitly copy the data around. If this value is false the code generator will raise an exception if such a Memlet is encountered. This allows the user to have full control over all Maps in the SDFG.

Default value: True

compiler.cuda.args

nvcc Arguments

Type: str

Description: Compiler argument flags for CUDA

Default value: -Xcompiler -march=native --use_fast_math -Xcompiler -Wno-unused-parameter

Default value (on Windows): -O3 --use_fast_math

compiler.cuda.backend

Compilation backend

Type: str

Description: Backend to compile for (‘auto’ for automatic detection, ‘cuda’ for NVIDIA, or ‘hip’ for AMD).

Default value: auto

compiler.cuda.block_size_lastdim_limit

Maximum last dimension thread-block size in code generation

Type: int

Description: Threshold for the GPU code generator to fail in generating a kernel with a specified larger block size in the third dimension. Default value is derived from hardware limits on common GPUs.

Default value: 64

compiler.cuda.block_size_limit

Maximum thread-block size in code generation

Type: int

Description: Threshold for the GPU code generator to fail in generating a kernel with a specified overall larger block size. Default value is derived from hardware limits on common GPUs.

Default value: 1024

compiler.cuda.cuda_arch

Additional CUDA architectures

Type: str

Description: Additional CUDA architectures (separated by commas) to compile GPU code for, excluding the current architecture on the compiling machine.

Default value: 60

compiler.cuda.default_block_size

Default thread-block size

Type: str

Description: Default thread-block size for GPU kernels when explicit GPU block maps are not defined. Can be set to ‘max’ to maximize occupancy.

Default value: 32,1,1

compiler.cuda.dynamic_map_block_size

Thread-Block size for GPU_ThreadBlock_Dynamic

Type: str

Description: Thread-Block size for maps using GPU_ThreadBlock_Dynamic scheduler. Can be set to ‘max’ to maximize occupancy.

Default value: 128,1,1

compiler.cuda.dynamic_map_fine_grained

Enable fine grained load balancing for GPU_ThreadBlock_Dynamic

Type: bool

Description: If true the scheduler will dynamically redistribute the combined work of all threads in the warp equally across the warp (fine grained). Otherwise, each warp works sequentially only on its tasks (potential load imbalance).

Default value: True

compiler.cuda.hip_arch

Additional HIP architectures

Type: str

Description: Additional HIP architectures (separated by commas) to compile GPU code for, excluding the current architecture on the compiling machine.

Default value: gfx906

compiler.cuda.hip_args

hipcc Arguments

Type: str

Description: Compiler argument flags for HIP

Default value: -fPIC -O3 -ffast-math -Wno-unused-parameter

compiler.cuda.libs

Additional libraries

Type: str

Description: Additional linked libraries required by target

Default value: (Empty)

compiler.cuda.max_concurrent_streams

Concurrent execution streams

Type: int

Description: Maximum number of concurrent CUDA/HIP streams to generate. Special values: -1 only uses the default stream, 0 uses infinite concurrent streams.

Default value: 0

compiler.cuda.mempool_release_threshold

Memory pool memory release threshold

Type: int

Description: A value that determines how large a memory allocation has to be before it is automatically released from the memory pool to the system. The default is -1, which indicates “never release”. Other values may be 0 (always release), or any byte value. For more information, see cudaMemPoolAttrReleaseThreshold in the CUDA toolkit documentation.

Default value: -1

compiler.cuda.path

CUDA/HIP path override

Type: str

Description: Path to CUDA toolkit or ROCm/HIP root directory

Default value: (Empty)

compiler.cuda.persistent_map_SM_fraction

Fraction of SMs to use for persistent GPU map

Type: float

Description: Sets the fraction of the number of SMs of the Device that the GPU_Persistent map can use. Together with persistent_map_occupancy this specifies the grid size of the kernel being launched. 0.0 < persistent_map_SM_fraction <= 1.0 The fraction will be rounded up to the next integer number of SMs. The max value of SMs that can/will be used is equal to cudaDevAttrMultiProcessorCount.

Default value: 1.0

compiler.cuda.persistent_map_occupancy

Number of blocks to launch per SM used

Type: int

Description: Sets the number of thread block to be launched per SM being used. Essentially this is a simple multiplier to persistent_map_SM_fraction. It is up to the user to check if the resulting number of thread blocks can run efficiently on the GPU.

Default value: 2

compiler.cuda.syncdebug

Synchronous Debugging

Type: bool

Description: Enables Synchronous Debugging mode, where each library call is followed by full-device synchronization and error checking.

Default value: False

compiler.cuda.thread_id_type

Thread/block index data type

Type: str

Description: Defines the data type for a thread and block index in the generated code. The type is based on the type-classes in dace.dtypes. For example, uint64 is equivalent to dace.uint64. Change this setting when large index types are needed to address memory offsets that are beyond the 32-bit range, or to reduce memory usage.

Default value: int32

compiler.linker

Linker preferences

compiler.linker.args

Arguments

Type: str

Description: Linker argument flags

Default value: (Empty)

Default value (on Darwin): (Empty)

Default value (on Windows): (Empty)

compiler.linker.executable

Linker executable override

Type: str

Description: File path or name of linker executable

Default value: (Empty)

compiler.mpi

MPI compiler preferences

compiler.mpi.executable

Compiler executable override

Type: str

Description: File path or name of compiler executable

Default value: (Empty)

experimental

Experimental features

experimental.check_race_conditions

Check race conditions

Type: bool

Description: Check for potential race conditions during validation.

Default value: False

experimental.validate_undefs

Undefined Symbol Check

Type: bool

Description: Check for undefined symbols in memlets during SDFG validation.

Default value: False

frontend

Python frontend preferences

frontend.avoid_wcr

Avoid using WCR for augmented assignments when possible

Type: bool

Description: Perform a map-symbol-dependency check on the write-subsets of augmented assignments that appear inside Maps to avoid using WCR when possible. This feature works correctly only when there is a single augmented assignment for each data dimension inside a Map.

Default value: False

frontend.cache_size

Program cache size

Type: int

Description: The number of compiled programs to cache (based on argument types, closure constants, and closure array types) to avoid reparsing/compiling when calling a @dace.program or method.

Default value: 32

frontend.check_args

Check arguments on SDFG call

Type: bool

Description: Perform an early type check on arguments passed to an SDFG when called directly (from SDFG.__call__). Another type check is performed when calling compiled SDFGs.

Default value: False

frontend.dont_fuse_callbacks

Do not fuse callbacks

Type: bool

Description: Stricter mode of operation where callbacks into Python don’t participate in state fusion transformations.

Default value: False

frontend.implicit_recursion_depth

Auto-parsing recursion depth

Type: int

Description: The maximum call-stack depth allowed when automatically parsing called dace functions or methods.

Default value: 64

frontend.preprocessing_passes

Number of preprocessing passes on Python code

Type: int

Description: Number of times to run the Python preprocessing passes (e.g., constant folding) on the input code. Set to zero to disable preprocessing optimizations, set to -1 to run until the code has not changed.

Default value: 5

frontend.raise_nested_parsing_errors

Raise nested parsing errors

Type: bool

Description: Raise all errors out of nested function parsing contexts instead of trying to create a callback implicitly.

Default value: False

frontend.typed_callbacks_only

Only allow typed callbacks

Type: bool

Description: Stricter mode of operation where callbacks into Python must have explicit return value types in order to compile.

Default value: False

frontend.unroll_threshold

Automatic unroll loop size threshold

Type: int

Description: Threshold for automatic loop unrolling of any generator (e.g., including range) with a compile-time size. A value of -1 (default) means not to unroll any loop automatically, a value of 0 means unrolling every loop, and a value above zero sets a size threshold beyond which a constant-sized loop will not be automatically unrolled.

Default value: -1

frontend.verbose_errors

Show preprocessed AST on parsing errors

Type: bool

Description: Prints out the preprocessed unparsed AST in case of a parsing error.

Default value: False

instrumentation

Instrumentation preferences

instrumentation.report_each_invocation

Save report for each invocation

Type: bool

Description: Save an instrumentation report file for each invocation of the SDFG, rather than one report that spans from SDFG initialization to finalization.

Default value: True

instrumentation.papi

PAPI configuration

instrumentation.papi.default_counters

Default PAPI counters

Type: str

Description: Sets the default PAPI counter list, formatted as a Python list of strings.

Default value: ['PAPI_TOT_INS', 'PAPI_TOT_CYC', 'PAPI_L2_TCM', 'PAPI_L3_TCM']

instrumentation.papi.overhead_compensation

Compensate Overhead

Type: bool

Description: Subtracts the minimum measured overhead from every measurement.

Default value: True

instrumentation.papi.vectorization_analysis

Enable vectorization check

Type: bool

Description: Enables analysis of gcc vectorization information. Only gcc/g++ is supported.

Default value: False

library

Settings for handling the use of DaCe libraries.

library.blas

Built-in BLAS DaCe library.

library.blas.default_implementation

Default implementation

Type: str

Description: Default implementation for BLAS library nodes.

Default value: pure

library.blas.override

Force configured implementation

Type: bool

Description: Force the default implementation, even if an implementation has been explicitly set on a node.

Default value: False

library.lapack

Built-in LAPACK DaCe library.

library.lapack.default_implementation

Default implementation

Type: str

Description: Default implementation for LAPACK library nodes.

Default value: OpenBLAS

library.lapack.override

Force configured implementation

Type: bool

Description: Force the default implementation, even if an implementation has been explicitly set on a node.

Default value: False

library.linalg

Built-in NumPy linalg DaCe library.

library.linalg.default_implementation

Default implementation

Type: str

Description: Default implementation for linalg library nodes.

Default value: OpenBLAS

library.linalg.override

Force configured implementation

Type: bool

Description: Force the default implementation, even if an implementation has been explicitly set on a node.

Default value: False

library.pblas

Built-in PBLAS DaCe library.

library.pblas.default_implementation

Default implementation

Type: str

Description: Default implementation PBLAS library nodes.

Default value: MKLMPICH

library.pblas.override

Force configured implementation

Type: bool

Description: Force the default implementation, even if an implementation has been explicitly set on a node.

Default value: False

optimizer

Preferences of the SDFG Optimizer

optimizer.automatic_simplification

Automatic SDFG simplification

Type: bool

Description: Automatically performs SDFG simplification on programs.

Default value: True

optimizer.autooptimize

Run auto-optimization heuristics

Type: bool

Description: Automatically runs the set of optimizing transformation heuristics on any program called via the Python frontend.

Default value: False

optimizer.autospecialize

Auto-specialize symbols

Type: bool

Description: Automatically specialize every SDFG to the symbol values at call-time. Requires all symbols to be set.

Default value: False

optimizer.autotile_partial_parallelism

Prefer partial parallelism over write-conflict tiling

Type: bool

Description: If true, sets the auto-optimizer to prefer extracting map parallel dimensions over tiling for atomic write-conflict resolution edges. This may be slower in case of small parallel dimensions vs. conflicted dimensions. This preference only applies to symbolic ranges or ranges over the autotile_size parameter.

Default value: True

optimizer.autotile_size

Default tile size in auto-optimization

Type: int

Description: Sets the default tile size for the optimization heuristics.

Default value: 128

optimizer.detect_control_flow

Detect control flow from state transitions

Type: bool

Description: Attempts to infer control flow constructs “if”, “for” and “while” from state transitions, allowing code generators to generate appropriate code.

Default value: True

optimizer.match_exception

Treat exceptions in “can_be_applied” as errors

Type: bool

Description: When an exception is raised in a transformation “can_be_applied” function, if True the exception is raised further. Otherwise the exception is printed as a warning.

Default value: False

optimizer.save_intermediate

Save intermediate SDFGs

Type: bool

Description: Save SDFG files after every transformation.

Default value: False

optimizer.symbolic_positive

Treat all symbolic expressions as positive

Type: bool

Description: Every expression in which a symbolic value appears is treated as strictly positive. This is necessary for certain Range evaluations using Subgraph Fusion.

Default value: True

optimizer.visualize_sdfv

Visualize SDFG

Type: bool

Description: Open an SDFG in browser every transformation.

Default value: False

testing

Unit testing settings

testing.deserialize_exception

Treat exceptions in deserialization as errors

Type: bool

Description: When an exception is raised in a deserialization process (e.g., due to missing library node), by default a warning is issued. If this setting is True, the exception will be raised as-is.

Default value: False

testing.serialization

Test Serialization on validation

Type: bool

Description: Before generating code, verify that a serialization/deserialization loop generates the same SDFG.

Default value: False

testing.serialize_all_fields

Serialize all unmodified fields in SDFG files

Type: bool

Description: If False (default), saving an SDFG keeps only the modified non-default properties. If True, saves all fields.

Default value: False