Nsight Compute 2025.4 - New Features
Updates in 2025.4.0
- Added support for profiling CUDA tile workloads.
- Introduced a new Tile section to summarize tile dimensions and pipeline utilization, displayed when enabled and a tile workload is profiled.
- Source page supports correlation between SASS and high-level Tile code (limited to cuTile Python code).
- Added a new
ncu-repzfile format for zstd compressed report files. - Added support for locking GPUs to boost clock instead of base on Ampere and newer GPU. Use the
boostandforce-boostoptions on supported drivers. - Warp sampling by default now focuses on the Not Issued ((
_not_issued)) variants of the metrics. This is to avoid pointing to source locations where warp stalls are mitigated by having sufficient numbers of warps during an issue cycle to hide latency. - Added support for node-level profiling of CUDA conditional graphs, including device-updatable nodes and nodes that can set conditional graph handles.
- Added support for node-level profiling of CUDA graphs launched from the device (DGL), including host graph nodes that can launch DGL.
- Source page now displays symbol labels: A new column for symbol labels has been added, and symbol labels are shown alongside addresses in SASS instruction disassembly. This change aligns the output with that of the nvdisasm tool.
- Added support for collecting Warp sampling metrics with PM sampling allowing user to see function-level warp stalls for the selected time range in the timeline. See the Function Stats tool window for details.
General
NVIDIA Nsight Compute
NVIDIA Nsight Compute CLI
Resolved Issues
For a complete overview of all NVIDIA® Nsight™ Compute features and access to resources, please visit the main Nsight™ Compute page.
NVIDIA® Nsight™ Compute 2025.4 is available for download under the NVIDIA Registered Developer Program.
References