This page covers new CUDA Debugger API implementation Development Preview for x86_64 Linux.

This implementation of the debugger API is a drop-in replacement for the existing one built into In a future toolkit release, support for the older implementation will be deprecated.

The target audience for this developer preview are end users who call the CUDA Debugger API directly, and those who make use of cuda-gdb and are willing to become early adopters of the new CUDA Debugger API backend implementation.

Any bug reports or feedback are appreciated, please submit them using one of the methods below:

You may download the development preview installer here.

Note, that development preview should be installed on top of installed cuda-gdb package.


As a cuda-gdb backend

To use the new CUDA Debugger API implementation as the cuda-gdb backend, the following environment variables need to be set on the target machine (i.e. where cuda-gdb is if not debugging remotely and where cuda-gdbserver is if debugging remotely):

  • CUDBG_DEBUG_AGENT_PATH=/path/to/NvDebugAgent

To automate setting these variables, the user can source the set-env file in the install directory, e.g. like so:

. /path/to/install/folder/set-env

To revert back to the current CUDA Debugger API implementation, unset the CUDBG_INJECTION_PATH environment variable. It is not necessary to unset CUDBG_DEBUG_AGENT_PATH.

As a CUDA Debugger API library

To use the new CUDA Debugger API implementation as a library, in your debug driver write to the's global variable cudbgInjectionPath the path to the library on disk before calling cudbgGetAPI(cudbgInjectionPath is a char array of size 4096). This will make cudbgGetAPI choose the injected implementation over the current one.

In addition to this, on the application side write to the same cudbgInjectionPath variable the path to library on disk either before cuInit is called (non-attach scenario), or after it's called, but before cudbgApiInit(2) is remotely invoked (attach scenario). This will make the application choose the new implementation over the current one on the application side. For more details on the attach scenario, refer to the CUDA Debugger API documentation.

Note that both or none actions must be performed, otherwise the debug driver and the application would initialize different CUDA Debugger API implementations, which would lead to API calls failing or hanging.

Known issues and limitations

As this is a developer preview, no correctness, stability or performance guarantees are made.

Features not supported in this release:

  • Debugging contexts created on Maxwell (SM 5.X) and older GPUs is not supported. Please use existing based implementation for debugging on such devices.
  • Enabling memcheck in cuda-gdb
  • The autostep cuda-gdb feature.
  • Debugging CUDA applications making use of CUDA Dynamic Parallelism (CDP)
CUDA Debugger API methods not supported in this release:
  • getHostAddrFromDeviceAddr
  • lookupDeviceCodeSymbol
  • readTextureMemoryBindless
  • getElfImage (only returns relocated images)
  • getElfImageByHandle (only returns relocated images)
  • getGridInfo (does not return any CDP-related fields)
  • getGridStatus (does not return any CDP-related fields)
  • kernelReady.parentGridId (it's a CDP-related field)
  • memcheckReadErrorAddress (no memcheck support)

Known issues in this release:

Summary of the changes compared to developer preview version, released with CUDA Toolkit 11.4 (link):

  • Improved cuda-gdb remote debugging stability.
  • Improved handling of assert in GPU code.
  • Improved GPU stack trace accuracy.
  • Fixed missing GPU breakpoints issue with breakpoint set close to currently hit breakpoints.
  • readSyscallCallDepth CUDA Debugger API method is supported.