Updates in 2025.3.1

    General

    • Improved the charts in the Compute Workload Analysis section to better distinguish between per_cycle_active and per_cycle_elapsed metrics.

    Resolved Issues

    • Fixed an issue where kernels using the compile-time attribute __block_size__ were launched with incorrect grid dimensions.
    • Fixed an issue with timline y-axis labels being showing unexpected units for small max values.
    • Fixed a crash when stepping applications in interactive profiling mode.
    • Fix that roofline charts missed showing achived value data in some cases.
    • Fixed that duplicated tooltips could be shown for some links in the Memory Chart.
    • Fixed a potential hang when setting --pm-sampling-buffer-size to very large values.
    • Fixed several rules to not show non-actionable warnings for unsupported, missing metrics when profiling on mobile chips.

Updates in 2025.3.0

    General

    • Added support for CUDA 13.0.
    • Added or improved support for Blackwell chips.
    • Removed support for Volta chips.
    • For Green Context launches, launch__waves_per_multiprocessor is now scaled to the number of SMs in the Green Context.
    • Added support for profiling individual nodes of device-launchable CUDA graphs launched from the host.
    • Added metric launch__persisting_l2_cache_size to the Memory Workload Analysis section.
    • Removed metric profiler__pmsampler_dropped_samples.
    • Added support for not importing SASS cubins into the report.

    NVIDIA Nsight Compute

    NVIDIA Nsight Compute CLI

    • Added the option -forward-signals to transparently forward signals to the profiled application.

    Resolved Issues

    • Fixed that some ncu console messages were truncated after 1024 characters.
    • Fixed some display issues related to Green Context tables.
    • Improved the performance of remote profiling in application replay mode.
    • Fixed a hang in certain scenarios when profiling dependent kernels with device-mapped host allocations.
    • Fixed missing correlation between JIT-compiled PTX to SASS in some situations.
    • Fixed an error when profiling a CUDA graph kernel node doing a cluster launch on driver 580 or newer.

For a complete overview of all NVIDIA® Nsight™ Compute features and access to resources, please visit the main Nsight™ Compute page.

NVIDIA® Nsight™ Compute 2025.3 is available for download under the NVIDIA Registered Developer Program.

Download 2025.3 Update 1 Download 2025.3 Documentation
References