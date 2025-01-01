Nsight Compute 2025.1 - New Features

NVIDIA Nsight Compute Added support for Optix 9.0 functions optixClusterAccelComputeMemoryUsage and optixClusterAccelBuild .

and . Resolved Issues Fixed a possible deadlock condition while handling the launch of child processes on Linux systems.



Fixed a possible crash of the Nsight Compute UI when switching to the Source Page.



Fixed the missing roofline ceilings in the Floating Point Operations Roofline for GB20x chips.

General All roofline sections are now included in the full section set.

Range Replay and app-range replay are now supporting the collection of instruction-level source metrics.



Rules are now supported for range replays.



Improved which launch metrics are available for ranges.



Added a new launch__stack_size metric in the Launch Statistics section to report the configured stack size.

Added a new sass__inst_executed_register_spilling metric which counts the number of load and store instructions that were created by the compiler due to register spilling.

Nsight Compute host GUI now natively supports macOS arm64.

NVIDIA Nsight Compute Added interactive tooltips to Details and Source pages. An interactive tooltip can be used to compare different baselines. Its content can be copied to the clipboard using the copy icon button.



CUDA Green Contexts support is improved by showing TPC mask information in the Launch Statistics section, the Resources tool window, and on the Session page.



Added heatmap to the Source Comparison document to visualize the source code differences.



Added Diff By drop down menu to the Source Comparison document in the SASS view, this allows you to choose the diff basis based on either Opcode or Full Instruction.



Performance improvements in SASS view.



The Resources View for CUDA Graphs can now visualize the graph structure directly in a new Chart mode.



The Memory Chart now supports zoom and pan.



The Metric Details tool window now shows PM sampling metrics from the timeline as context switched.



Improved the performance for deploying to target systems over remote connections.

Resolved Issues Fixed that on some systems, not all free GPU memory was considered when saving context memory for multi-pass data collection.



Fixed an incorrect multiplier in the calculation of non-tensor FP16 rooflines.



Fixed the metric Avg. Threads Executed for inlined functions with control flow.

Fixed that in some situations, no average was shown in the Source Statistics table for Warp Stall sampling metrics.



Fixed several SASS syntax highlighting issues.



Fixed an issue where the SM count wasn't shown correctly in the report header when loading older reports.



Improved interactions between the Metric Details tool window and the memory chart.

Updates in 2024.4

General Added support for the Blackwell architecture.



Added support for several launch__* metrics for CUDA graphs.

Added support for cuMemBatchDecompressAsync API in the Range Replay.

NVIDIA Nsight Compute A new feature overview is now shown the first time a new UI version is opened.



Switched the default orientation of the Raw page to show metrics in rows and profile results in columns.



Added support for reporting register spilling compiler annotations on the Source page.



The source page has improved search with support for regular expression- and value-based lookups.



Added support to set a Source View Profile as the default profile to apply it automatically while opening a report.



Added hyperlinks for the line numbers and inline function addresses in the Inline Table. This enabled you to quickly jump to the respective line number in the Source view and address in the SASS view. Added a new column Source File in the Inline Table to show the file name to which source belongs.



The memory chart can indicate or hide inactive elements.



Chart tooltips on the Details page now show more relevant information when a specific value is hovered.



Roofline charts now support showing the formula for ridge point calculation in the metric details tool window.



The occupancy calculator now considers the impact of block barriers for Hopper-architecture and newer GPUs. It also has improved controls to adjust input values.



The remote connections dialog now supports placeholders to deploy files to e.g. user-specific directories on the target system.

NVIDIA Nsight Compute CLI Added new --nvtx-push-pop-scope command line option which allows to set push pop range scope process wide.

command line option which allows to set push pop range scope process wide. Resolved Issues Fixed UI scrolling issues on macOS trackpads.



Fixed that certain Python script errors were not properly reported when loading rule files.



On CUDA 12.7 drivers, context switch trace can now filter events more precisely to the profiled CUDA context, even when profiling in containers.



NVTX filtering now properly supports start/end ranges that start and end in different threads.



Fixed several issues with Range Replay when capturing CUDA memcpy APIs.

For a complete overview of all NVIDIA® Nsight™ Compute features and access to resources, please visit the main Nsight™ Compute page.

NVIDIA® Nsight™ Compute 2025.1 is available for download under the NVIDIA Registered Developer Program.

