Updates in 2024.2.1

    General
    • Improved performance when filtering by NVTX context in kernel and application replay.
    • Improved documentation for metric units and terms.
    NVIDIA Nsight Compute
    • Improved tooltips for the memory chart.
    • Improved timeline row maximum selection for PM sampling metrics.

    Resolved Issues

    • Fixed an issue that the report result dropdown may not update when some options were changed.
    • Fixed an issue with the Source page statistics table.
    • Fixed an issue with PM sampling reporting incomplete data.
    • Fixed an issue with filering using NVTX start/end ranges.
    • Fixed an issue with demangling Numba CUDA kernel names.
    • Fixed an issue with profiling multi-ctx applications on vGPU.
    • Fixed an issue that resulted in L1 caches not always being invalidated for every pass.
    • Fixed an issue with applications using execlp.

Updates in 2024.2.0

    General
    • PM sampling timelines now show the sampled GPU workload activities.
    • Added support for collecting Python Call Stacks alongside native ones to better understand the context of a workload in Python applications.
    • Demangled kernel names can now be automatically simplified or manually renamed. For reports with multiple results, all names are considered during simplification to make them easier to distinguish.
    • Removed support for ppc64le.

    NVIDIA Nsight Compute

    • Redesigned the report header for easier access to all report pages. All actions are now sorted into clearly labeled buttons. The focused result selection was integrated into the current row. When adding the current result as a baseline, the row itself is updated to reflect this, instead of showing a separate one.
    • Redesigned Source Page and Source Comparison controls to allow more vertical space.
    • The Source Page and Source Comparison Navigate By dropdowns can be linked to the respective other view. Changing column names from one dropdown will change it in the other view, too. This applies only if a column is also available in the second view.
    • Added Inline Table support in the Source Comparison document to each side separately.
    • Added rich support for Python and Fortran source syntax highlighting. Enhanced CUDA-C and PTX syntax highlighting.
    • Added a Statistics Table to the Source Page that allows you to quickly see aggregated metrics across a custom selection of lines.
    • Improved tooltips in the memory chart to show more detailed information when metrics are missing.
    • The Acceleration Structure Viewer can now compute ray-geometry intersection and traversal timing heatmaps.
    • Added support for ignoring directories in the section search folder.
    • Added support for specifying custom metric descriptions in section files.
    • Added a warning if the opened report is newer than the UI and may not be fully compatible.

    NVIDIA Nsight Compute CLI

    • The raw page csv output now includes metric instance values when these enabled for printing.

    Resolved Issues

    • Improved handling of short workloads during PM sampling.
    • Improved units for several metrics.
    • Fixed an issue that some metrics did not show aggregates on the Summary and Raw pages.
    • Fixed an issue that profiled applications could inadvertently overwrite which PerfWorks library is loaded by the tool.
    • Fixed an issue that kernel names including $ were modified by the shell when profiling them from the System Trace activity.
    • Fixed an issue that reports could not be saved to the expected file extension in certain cases.

For a complete overview of all NVIDIA Nsight Compute features and access to resources, please visit the main Nsight Compute page.

NVIDIA Nsight Compute 2024.2 is available for download under the NVIDIA Registered Developer Program.

Download 2024.2 Update 1 Download 2024.2 Documentation
References