Nsight Graphics 2024.3.0
NVIDIA® Nsight™ Graphics 2024.3.0 is released with the following changes:
New Features:
- GPU Trace
- An Active Threads Per Warp histogram is now available throughout the Shader Profiler views within GPU Trace, including Shader Pipelines, Top-Down, Bottom-Up, Hot Spots, and Source/Disassembly. The “Timeline Metrics” setting must be set to either “Top-Level Triage” or “Ray Tracing Triage” (if available), and the Real-Time Shader Profiler enabled. Use this to identify shaders, functions, and source lines that paid a performance penalty due to branch divergence or suboptimal number of threads launched. Values on the right of the histogram (closer to 32) indicate more efficient instruction execution. The values shown in GPU Trace are approximated from the sampling of performance counters at the time of each code block's execution; if you need a precise value, consider using the Frame Debugger | Shader Profiler.
- In the Shader Pipelines view, the Average Warp Latency is now shown as a histogram. This histogram reveals the distribution of average warp durations across separate invocations in the trace. A popup tooltip shows the histogram in greater detail.
- Shader Profiler
- The Shader Profiler now supports source correlation for D3D12 Work Graph shaders, enabling the full functionality of line-by-line analysis in the Shader Source view and Hot Spots list. This requires the newest R565 series driver.
Improvements:
- GPU Trace
- The Top-Down calls table now has an Aggregate Regions checkbox. When unchecked, separate usages of a shader function are shown in separate rows, allowing you to see the unique performance characteristics of each usage. Use this to identify when performance varied in a data dependent way.
- The Active Threads Per Warp timeline metric now only takes the active thread mask into account; this applies to the Top-Level Triage and Ray Tracing Triage (Pro) configurations. This provides a better view of when branch divergence occurred, helping to guide your placement of shader execution reordering calls. It is also the basis of the Active Threads Per Warp histogram mentioned above. Previously, this metric was defined as the count of active and predicated-on threads, which is less actionable for these types of optimizations.
- Added support for running GPU Trace of Vulkan SC on Windows and Linux desktop platforms. More information about Vulkan SC and driver support can be found on NVIDIA’s Vulkan Driver Support page.
- Pinned rows in the Timeline in GPU Trace now stay in the same logical order rather than jumping to the top of the list. They will “stick” to the top or bottom of the view when scrolling through Timeline rows. Row order can still be changed by selecting a row and Alt-dragging it to the desired position.
- Frame Debugger
- Added support for frame debugging of applications that use the newly released Vulkan 1.4 standard. At the time of this Nsight Graphics release, new beta drivers supporting Vulkan 1.4 can be found on NVIDIA’s Vulkan Driver Support page.
- Added support for frame debugging of applications that use Vulkan SC on Windows and Linux desktop platforms. More information about Vulkan SC and driver support can be found on NVIDIA’s Vulkan Driver Support page
- In the Ray Tracing Inspector, a HUD showing the total ray traversal time in milliseconds is now enabled when the traversal heatmap is active. The heatmap shows how the traversal cost is distributed, but the whole-frame traversal time provides an absolute numerical reference. This can make it easier to identify issues with acceleration structure efficiency while moving the camera or switching between camera bookmarks. Note that as we do not reproduce the precise methods and algorithms that your application implements, this measurement is an approximation of traversal cost and not necessarily representative of the performance of your application.
- Shader Debugger
- Fixed stepping issues with shaders using multiple functions in the callstack.
- Fixed host stability problem when the application destroys pipeline objects.
- Fixed problems attaching to applications launched via Steam due to the 32b launcher not being supported.
- Aftermath
- We have updated the Aftermath SDK within this release of Nsight Graphics 2024.3.0 to the latest Nsight Aftermath SDK 2024.3.0. See the Nsight Aftermath SDK 2024.3.0 Release Notes for more information.
Known Issues:
- GPU Trace
- The GPU Contexts row may contain incomplete data in scenarios with high frequency context switching. This is a limitation of the NVIDIA driver.
- On Windows, CommandList timeline events may appear to be active for longer than their true duration, when in reality the underlying hardware queue was in a wait state for the initial portion of that time. This only occurs when Windows Hardware Accelerated GPU Scheduling is enabled.
- When the Real-Time Shader Profiler is enabled, the new SM Warp Occupancy timeline row (per-shader-stage) is only able to show complete stats for D3D12, Vulkan, OpenGL (including EGL), and CUDA contexts in the traced process. Other contexts (D3D11 and older) will have incomplete values in this row, with a striped pattern showing on the timeline.
- In trace reports where the Real Time Shader Profiler was enabled, the Warp Occupancy row does not explicitly show the “Ray Tracing Runtime” contribution (e.g., Traversal, Acceleration Structure Build). These categories are Unattributed in this row. However, the Shader Pipelines view does accurately list them.
- Frame Debugger
- Event markers may have incorrect nesting in captures from the newest versions of Unreal Engine 5.
- Pixel History is not supported for Vulkan SC applications.
- Shader View is not supported for Vulkan SC applications on Windows due to some limitations, this may be fixed in a future version.
- C++ Capture export of applications using VK_NV_acquire_winrt_display will not function properly due to this bug in Vulkan SC Loader.
- Aftermath
- For applications that use CUDA alongside a graphics API, either explicitly or via an NVIDIA provided library, Aftermath will sometimes generate a crash dump for a fault that occurs on the CUDA workload. This will be improved in a future driver to only generate crash dumps from graphics workloads.
- Shader Debugger
- The debug symbolic information provided by the driver is limited for ray tracing shaders. This can result in not seeing all of the local variables in the Locals/Watch windows while stepping through code. This will be improved in future drivers.
- For shaders that have multiple layers deep callstacks, there can be errors in identifying the correct local variables for the current scope. This will be improved in future drivers.
For more details and known issues, please see the full release notes!
For an overview of Nsight™ Graphics and access to resources, please visit the main Nsight™ Graphics page.
NVIDIA® Nsight™ Graphics 2024.3.0 is available for download under the NVIDIA Registered Developer Program.