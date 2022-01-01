Nsight Compute 2022.1 - New Features

General

Filtering kernel launches or profile results based on NVTX domains/ranges now takes registered strings in the payload field into account, if the range name is empty.

Added support for the suffix .max_rate for ratio metrics.

Resolved Issues

Fixed a crash during the disassembly of the kernel's SASS code for the Source page.

Fixed a crash on exit of the NVIDIA Nsight Compute UI.

Fixed a hang during profiling when CPU call stack collection is enabled.

Fixed missing to flush UVM buffers before taking memory checkpoints during Range Replay.

Fixed tracking of memory during Range Replay, if the CUDA context has any device mapped memory allocations.

Fixed the maximum available shared memory sizes in the Occupancy Calculator for NVIDIA Ampere GPUs.

Fixed that the shared memory usage of the kernel is incorrectly initialized when opening the Occupancy Calculator from a profile report.

General

Added support for the CUDA toolkit 11.6.

Added a new Range Replay mode to profile ranges of multiple, concurrent kernels. Range replay is available in the NVIDIA Nsight Compute CLI and the non-interactive Profile activity.

Added a new rule to detect non-fused floating-point instructions.

The Uncoalesced Memory access rules now show results in a dynamic table.

Unix Domain Sockets and Windows Named Pipes are used for local connection between the host and target processes on x86_64 Linux and Windows, respectively.

The NvRules API now supports querying action names using different function name bases (e.g. demangled).

NVIDIA Nsight Compute

The default report page is now chosen automatically when opening a report.

Added coverage for ECC (Error Correction Code) operations in the L2 Cache table of the Memory Analysis section.

Added a new L2 Evict Policies table to the Memory Analysis section.

The Occupancy Calculator now updates automatically when the input changes.

Added new metric Thread Instructions Executed to the Source page.

Added tooltips to the Register Dependency columns in the Source page to identify the associated register more conveniently.

Improved the selection of Sections and Sets in the Profile activity connection dialog.

NVLink utilization is shown in the NVLink Tables section.

NVLink links are colored according to the measured throughput.

NVIDIA Nsight Compute CLI

--kernel-regex and --kernel-regex-base options are no longer supported. Alternate options are --kernel-name and --kernel-name-base respectively, added in 2021.1.0.

and options are no longer supported. Alternate options are and respectively, added in 2021.1.0. Added support to resolve CUDA source files in the --page source output with the new --resolve-source-file command line option.

source output with the new command line option. Added new option --target-processes-filter to filter the processes being profiled by name.

to filter the processes being profiled by name. The CPU Stack Trace is shown in the NVIDIA Nsight Compute CLI output.

Resolved Issues

Fixed the calculation of aggregated average instruction execution metrics in non-SASS views on the Source page.

Fixed that atomic instructions are counted as both loads and stores in the Memory Analysis tables.

