Updates in 2024.1.1

    General
    • Added clarification that for profiling a range with multiple, active CUDA Green Contexts, counter values that are not attributable to SMs will be aggregated over all these Green Contexts.

    Resolved Issues

    • Changed the way the PerfWorks library is loaded into the target application’s process space. This addresses possible connection errors in case the library search path includes other directories with PerfWorks libraries.
    • Fixed an issue that caused PM sampling data to be missing from the results of a Profile Series.
    • Fixed the incorrect calculation of the percentage values in the Inline Function table.
    • Fixed a potential crash of the NVIDIA Nsight Compute UI when PM sampling data was requested, but no sample was collected.

Updates in 2024.1.0

    General
    • Added new metrics available when profiling on CUDA Green Contexts.
    • Reduced the number of passes required for collecting PM Sampling sections.
    • Counter domains can now be specified for PM sampling metrics in section files.
    • PM sampling metrics can now be queried in the command line and Metric Details window by specifying the respective collection option.
    • Added a new optional PmSampling_WarpStates section for understanding warp stall reasons over the workload duration.
    • Added a new rule for detecting load imbalances.
    • Improved the performance of graph-level profiling on new drivers.
    • Updated the metrics compatibility table for OptiX cmdlists and instruction-level SASS metrics.

    NVIDIA Nsight Compute

    • Added SASS view and Source Markers support in Source Comparison.
    • Improved Source Comparison diff visualization by adding empty lines on other side of inserted/deleted lines.
    • The Source page column chooser can now be opened directly from the Navigation drop down.
    • Added a Launch Details tool window for showing information about individual launches within larger workloads like OptiX command lists.
    • Added support for CUDA Green Contexts in the Resources tool window, the Launch Statistics section and the report header.

    NVIDIA Nsight Compute CLI

    • Improved documentation on NVTX expressions and command line output when a potentially incorrect expression led to no workloads being profiled.
    • Improved checking for invalid expressions when using the --target-processes-filer option.

    Resolved Issues

    • Fixed that the L1 cache achieved roofline value was missing when profiling on GH100.
    • Fixed several “Launch Failed” errors when collecting instruction-level SASS metrics.
    • Fixed that Live Register values would be too high for some workloads.
    • Fixed a scrolling issue on the Source page when collapsing a multi-file view.
    • Fixed an issue that no PM sampling data would be shown in the timeline when context switch trace was not available.
    • Fixed a display issue in the memory chart when adding baselines.
    • Fixed a crash when adding baselines.
    • Fixed a crash in timeline views when not all configured data was available.
    • Fixed that the application history was not always deleted when selecting Reset Application Data.
    • Fixed an error in the metric compatibility documentation.

For a complete overview of all NVIDIA Nsight™ Compute features and access to resources, please visit the main Nsight™ Compute page.

NVIDIA® Nsight™ Compute 2024.1 is available for download under the NVIDIA Registered Developer Program.

Download 2024.1 Update 1 Download 2024.1 Documentation
References