TotalView is the leading dynamic analysis and debugging tool designed to handle complex CPU and GPU based multi-threaded, multi-process and multi-node cluster applications. It enables developers to analyze their C, C++, Fortran and mixed-language Python applications in order to quickly diagnose and fix bugs. Using TotalView’s powerful reverse debugging, memory debugging and advanced debugging technologies, developers are able to reduce development cycles and amount of time it takes to find and fix difficult bugs in their complex codes. TotalView supports the latest CUDA SDK’s, NVIDIA GPU hardware, Linux x86-64, Arm64, and OpenPower platforms and applications utilizing MPI and OpenMP technologies.

The TotalView CUDA features on Linux x86-64, ARM64 and PowerLE include:

Screen Shot
(click on image to expand)
  • Supports OpenACC directives
  • Debugging host and device code in the same session
  • CUDA running directly on NVIDIA latest GPUs
  • Linux and GPU device thread visibility
  • Full visibility to the hierarchical device, block, and thread memory
  • Navigating device threads by logical and device coordinates
  • CUDA function calls, host pinned memory regions and CUDA contexts
  • Handling CUDA functions inline and on the stack
  • Command line interface (CLI) commands for CUDA functions
  • Applications that use multiple NVIDIA devices at the same time
  • MPI applications on CUDA-accelerated clusters

TotalView reduces development time by eliminating compile-run-print cycles, simplifying the addition of complex features such as multi-threading, and identifying the trouble spots in applications.


For a risk-free evaluation of TotalView, please visit Perforce Software

For more information about TotalView  and other CUDA tools: