NVIDIA® Nsight™ Graphics 2024.2 is released with the following changes:

New Features:

  • Shader Debugger
    • We are excited to announce the public beta release of the Shader Debugger for Vulkan.
    • The public builds of Nsight Graphics now enable you to debug your shaders (ray tracing, mesh, raster, and compute) at the source level for your Vulkan applications on Windows and Linux.
    • This is a complete hardware debugger running on the GPU, and requires either 2 GPUs (including any mix of integrated GPUs and discrete GPUs) in a single system, or 2 systems connected via TCP/IP.
    • You can set conditional breakpoints, inspect local variables, step through code, and other activities vital to understanding the most challenging bugs.
  • GPU Trace
    • A Flame Graph of shader performance is now shown in GPU Trace. Use this to identify the most expensive shaders, functions, and inlined callstacks at a glance.
    • Identify when a shader, function, or source line was executing on the timeline, using the “Select on Timeline'' command. This command is available when you right click on any function or hot spot in the Flame Graph, Shader Pipelines view, or any other shader navigation table.
    • CUDA is now supported in GPU Trace. The API Event List, Timeline, and Shader Profiler views all support the CUDA API.
    • CUDA-In-Graphics (CIG) is now supported in GPU Trace. CUDA-in-Graphics is a new CUDA/Graphics-Interop mode, where the CUDA context resides in the same hardware context as D3D12.
    • In D3D12 applications, NGX Shaders and workloads are now shown in GPU Trace. This includes work items created by DLSS, DLSS Frame Generation, the Neural Rendering Cache (NRC SDK), and similar NVIDIA SDKs. Vulkan NGX support was introduced in the previous release (Nsight Graphics 2024.1).
    • D3D12 Work Graph shaders are now shown in the Shader Pipelines view. This requires a driver version newer than R560.70.
  • Shader Profiler
    • A Flame Graph of shader performance is now shown in the Shader Profiler.
    • The Ray Tracing Runtime’s contribution to shader performance is now visible in the Shader Pipelines view, on traces collected with an R560 driver. Within a DispatchRays call, it is now possible to see the cost of “Ray Tracing Traversal” and the “Ray Tracing Scheduler” alongside application-defined shaders (RayGen, Closest Hit, etc.).
    • Added navigation hyperlinks; jump to function definition in source view, jump to Top-Down/Bottom-Up view from source view, and go back/forward buttons.

Improvements:

  • GPU Trace
    • The PCI bandwidth (overhead) incurred by collecting GPU stats is now explicitly controllable in the launch settings; previously, it was located in Tools | Options. This setting indirectly controls the PM sampling frequency.
    • It is now possible to collect distinct timings for every action (Draw, Dispatch, Copy, etc.), by enabling the “Time Every Action” setting in the launch settings. Note that this may introduce additional overhead, and is most useful when profiling one workload at a time.
    • VkEvent signals and waits now appear in the Synchronization timeline row.
    • Major improvements in interactive UI performance, including selection changes, especially with applications that contain a huge amount of shader code.
    • Support for the NVTX API has been extended. NVTX is a CPU instrumentation API.
      • NVTX tracing works once again on graphics APIs, at the Queue level only.
      • NVTX tracing works on CUDA, and is projected onto all immediate mode API actions.
      • NVTX domains are now supported. Events from different domains will appear on separate timeline tracks.
      • Significant reduction in CPU overhead.
    • Selecting regions on the timeline is now more intuitive. Simply Click and Drag.
    • DXIL PDBs are now automatically embedded in the GPU Trace report file. This is attempted during collection, and each time the report is loaded. Now you can share single report files with fully resolved symbols, without having to copy additional PDBs.
    • In the launch options, the new default setting for Timeline Metrics is now “Top-Level Triage”. The Top-Level Triage configuration provides counters and metrics from every stage of the GPU pipeline. Notable improvements over “Throughput Metrics” include the way memory metrics are presented (including tooltips), and the Screen Pipe Data Flow row which reveals the number of pixels at each stage of the 3D raster and shading pipe.
    • Top-Level Triage now collects the “Active Threads Per Warp” metric when the Real Time Shader Profiler is active. This metric is useful for determining when to apply Shader Execution Reordering. The previous default - “Throughput Metrics” - is still available, and continues to be the way to measure Tensor Pipe Active.
  • Shader Profiler
    • DXIL PDB support has been improved; when compiling with the dxc -Fd <directory\ option, which generates files in the form of <directory>\ <hash>.pdb, the Shader Profiler will resolve these symbols. Symbol resolution requires adding <directory> to Nsight Graphics’ search paths, under Tools | Options … Search Paths | Separate Shader Debug Information. See also: Powerful Shader Insights: Using Shader Debug Info with NVIDIA Nsight Graphics | NVIDIA Technical Blog.
    • Major improvements in interactive UI performance, including selection changes, especially with applications that contain a huge amount of shader code.
    • Fixed an issue that caused PDBs to fail to load when multiple different shaders shared the same PDB.
    • Shaders containing #line directives now have their line mappings resolved correctly. The files referenced by #line directives are represented as separate source files in the UI, and perf stats are correctly attributed to lines within those files.
  • Frame Debugger
    • Supported VK_NV_low_latency2.
    • Added a “Ray Tracing Validation” project option for automatically enabling the Ray Tracing Validation layer and collecting the validation feedback into the Nsight Graphics UI. For more information, see this link.
  • Aftermath

Known Issues:

  • GPU Trace
    • The GPU Contexts row may contain incomplete data in scenarios with high frequency context switching. This is a limitation of the NVIDIA driver.
    • On Windows, CommandList timeline events may appear to be active for longer than their true duration, when in reality the underlying hardware queue was in a wait state for the initial portion of that time. This only occurs when Windows Hardware Accelerated GPU Scheduling is enabled.
    • The Average Warp Latency column in the shader profiler will provide accurate values within regions of execution where a single workload was executing; however, selecting broader regions will wash out the results (caused by averaging). The recommended approach is to use the Avg. Warp Latency timeline row to identify a region of execution, select a perf marker or range within that, and then use the shader profiler’s values.
    • When the Real-Time Shader Profiler is enabled, the new SM Warp Occupancy timeline row (per-shader-stage) is only able to show complete stats for D3D12, Vulkan, OpenGL (including EGL), and CUDA contexts in the traced process. Other contexts (D3D11 and older) will have incomplete values in this row, with a striped pattern showing on the timeline.
    • In trace reports where the Real Time Shader Profiler was enabled, the Warp Occupancy row does not explicitly show the “Ray Tracing Runtime” contribution (e.g. Traversal, Acceleration Structure Build). These categories are Unattributed in this row. However, the Shader Pipelines view does accurately list them.
  • Shader Profiler
    • Work Graph shaders cannot show their corresponding HLSL source code; this is a limitation of the R560 driver.
    • The Vulkan Shader Profiler’s support for KHR_non_semantic_info is contingent on shader compiler support. dxc -fspv-debug=vulkan-with-source works well, aside from this compiler bug. In the Vulkan SDK, glslangValidator -gVS does not produce sufficient info at the time of this release.
    • Unattributed samples are shown when the following types of shader code execute:
      • Driver-internal shaders, other than those explicitly listed (such as the Ray Tracing runtime).
      • NGX Shaders.
      • CUDA kernels belonging to NVIDIA SDKs like OptiX.
    • The Flame Graph, Top-Down Calls, Bottom-Up Calls, and Hot Spots views do not show the following information
      • Shaders that weren’t compiled with debug info
      • Ray Tracing Runtime shaders, such as “Traversal” and “Acceleration Structure Build”
  • Frame Debugger
    • Replay window will not be shown if the application uses VK_NV_low_latency2, due to a limitation in the driver.
  • Shader Debugger
    • Debugging applications that use asynchronous compute queues can cause intermittent instability issues and GPU hangs.
    • If a shader is being used in multiple pipelines, there are cases, especially in raster and compute shaders, the breakpoint will only be set for the pipeline requested, not all pipelines that use the shader instance.
    • The symbolics for Ray Tracing shaders are still being finalized. While they work in many situations, there are cases where some local variable data can be missing. This will continue to be improved over the next few driver releases.
    • The Shader Debugger can have bad interactions with the Vulkan Validation Layer due to how shader handles are modified by the layer. Therefore, we advise users to disable the Vulkan Validation Layer when using the Shader Debugger.
    • JIT recompiles for Ray Tracing pipelines can be slower than desired and will continue to be optimized to reduce the time taken to generate the needed symbolic information.
    • In the R560 driver series, replacement of shaders with just-in-time compiled debug shader instances fails on command buffers that were recorded with VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT. Source breakpoints set in such shaders will resolve on optimized shader instances with only a line table available.

For more details and known issues, please see the full release notes!

For an overview of Nsight™ Graphics and access to resources, please visit the main Nsight™ Graphics page.

NVIDIA® Nsight™ Graphics 2024.2 is available for download under the NVIDIA Registered Developer Program.

 Download   Documentation