You need to sign in or sign up before continuing.

Updates in 2025.4.0

    General

    • Added support for profiling CUDA tile workloads.
    • Introduced a new Tile section to summarize tile dimensions and pipeline utilization, displayed when enabled and a tile workload is profiled.
    • Source page supports correlation between SASS and high-level Tile code (limited to cuTile Python code).
    • Added a new ncu-repz file format for zstd compressed report files.
    • Added support for locking GPUs to boost clock instead of base on Ampere and newer GPU. Use the boost and force-boost options on supported drivers.
    • Warp sampling by default now focuses on the Not Issued ((_not_issued)) variants of the metrics. This is to avoid pointing to source locations where warp stalls are mitigated by having sufficient numbers of warps during an issue cycle to hide latency.
    • Added support for node-level profiling of CUDA conditional graphs, including device-updatable nodes and nodes that can set conditional graph handles.
    • Added support for node-level profiling of CUDA graphs launched from the device (DGL), including host graph nodes that can launch DGL.
    • Source page now displays symbol labels: A new column for symbol labels has been added, and symbol labels are shown alongside addresses in SASS instruction disassembly. This change aligns the output with that of the nvdisasm tool.
    • Added support for collecting Warp sampling metrics with PM sampling allowing user to see function-level warp stalls for the selected time range in the timeline. See the Function Stats tool window for details.

    NVIDIA Nsight Compute

    NVIDIA Nsight Compute CLI

    Resolved Issues

For a complete overview of all NVIDIA® Nsight™ Compute features and access to resources, please visit the main Nsight™ Compute page.

NVIDIA® Nsight™ Compute 2025.4 is available for download under the NVIDIA Registered Developer Program.

Download 2025.4 Documentation
References