Updates in 2025.4.1

    General

    • Added support for profiling OptiX workloads with the interactive profile activity.
    • The Resources tool window now shows a CUDA ID column for graph resources.

    Resolved Issues

    • Fixed issues in the Report Merge Tool.
    • Fixed issues in the Clustering Window.
    • Fixed an issue with the formatting of entries in the Inline Functions table.
    • Fixed an issue that could cause the UI to crash when switching between kernels in the Source page in certain cases.
    • Fixed an issue that the ncu CUDA Toolkit script may fail if unrecognized versions of Nsight Compute are installed separately.
    • Fixed several incorrect metrics in the PmSampling.section
    • Fixed an issue that PM Sampling timelines could be distorted due to broken initial samples.
    • Fixed a crash with node-level graph profiling in app-range replay mode. If graphs are uploaded outside the range, an error indicating that profiling it is not supported is now shown.
    • Fixed a crash when instantiating device-side graphs with memory nodes.

Updates in 2025.4.0

    General

    • Added support for profiling CUDA tile workloads.
    • Introduced a new Tile section to summarize tile dimensions and pipeline utilization, displayed when enabled and a tile workload is profiled.
    • Source page supports correlation between SASS and high-level Tile code (limited to cuTile Python code).
    • Added a new ncu-repz file format for zstd compressed report files.
    • Added support for locking GPUs to boost clock instead of base on Ampere and newer GPU. Use the boost and force-boost options on supported drivers.
    • Warp sampling by default now focuses on the Not Issued ((_not_issued)) variants of the metrics. This is to avoid pointing to source locations where warp stalls are mitigated by having sufficient numbers of warps during an issue cycle to hide latency.
    • Added support for node-level profiling of CUDA conditional graphs, including device-updatable nodes and nodes that can set conditional graph handles.
    • Added support for node-level profiling of CUDA graphs launched from the device (DGL), including host graph nodes that can launch DGL.
    • Source page now displays symbol labels: A new column for symbol labels has been added, and symbol labels are shown alongside addresses in SASS instruction disassembly. This change aligns the output with that of the nvdisasm tool.
    • Added support for collecting Warp sampling metrics with PM sampling allowing user to see function-level warp stalls for the selected time range in the timeline. See the Function Stats tool window for details.

    NVIDIA Nsight Compute

    NVIDIA Nsight Compute CLI

    Resolved Issues

For a complete overview of all NVIDIA® Nsight™ Compute features and access to resources, please visit the main Nsight™ Compute page.

NVIDIA® Nsight™ Compute 2025.4 is available for download under the NVIDIA Registered Developer Program.

Download 2025.4 Update 1 Download 2025.4 Documentation
References