Nsight Compute 2020.1 - New Features
Memory Workload Analysis showing
NVIDIA Ampere Architecture
Asynchronous Copy to Shared Memory
Memory Workload Analysis showing
NVIDIA Ampere Architecture
Compute Data Compression
Roofline Analysis comparing profiling runs of code optimized to near the GPU's full potential (red) and the baseline (purple).
NVIDIA® Nsight™ Compute 2020.1 Videos and Blogs
- 2020.1 Spotlight and GTC 2020 videos
- Unleashing the Power of NVIDIA Ampere Architecture with NVIDIA Nsight Developer Tools blog
- GTC 2020 Lab Video and Materials
Updates in 2020.1.2:
Found in CUDA Toolkit 11.0 Update 1General
Resolved Issues
NVIDIA® Nsight™ Compute 2020.1.1 has been released with the following update:
Found in CUDA Toolkit 11.0 GAGeneral
- Metrics passed to --metrics on the NVIDIA Nsight Compute CLI or in the respective Profile activity option are automatically expanded to all first-level sub-metrics if required. See the documentation on --metrics for more details.
- Added new rules for detecting inefficiencies of using the Compute Data Compression on the NVIDIA Ampere architecture.
- The version of the NVIDIA Nsight Compute target collecting the results is shown in the Session page.
- Added new launch__grid_dim_[x,y,z] and launch__block_dim_[x,y,z] metrics.
NVIDIA Nsight Compute
- The Break on API Error functionality has been improved when auto profiling.
NVIDIA Nsight Compute Command Line Interface
- The full path to the report output file is printed after profiling.
- Added and corrected metrics in the nvprof Metric Comparison table.
Resolved Issues
- Documented the breakdown: metrics prefix.
- Fixed handling of escaped domain delimiters in NVTX filter expressions.
- Fixed issues with the occupancy charts for small block sizes.
- Fixed an issue when choosing a default report page in the options dialog.
- Fixed that the scroll bar could overlap the content when exporting the report page as an image.
NVIDIA® Nsight™ Compute 2020.1.0 has been released with the following features and improvements:
Found in CUDA Toolkit 11.0 RCGeneral
- Added support for the NVIDIA GA100/SM 8.x GPU architecture
- Expanded platform support to include Arm SBSA (server based system architectures)
- Support for CUDA Toolkit 11.0 was added
- Added a rule for reporting uncoalesced memory accesses as part of the Source Counters section
- Added support for report name placeholders %p, %q, %i and %h
- The Kernel Profiling Guide was added to the documentation
- The Special Configurations sections was added to the documentation, detailing support for NVIDIA Ampere architecture's Multi-Instance GPU (MIG)
- Added support for Visual Studio integration (windows only)
NVIDIA Nsight Compute
- Added support for roofline analysis charts
- NVIDIA Ampere architecture enhancements
- Memory Workload Analysis Report now shows Compute Data Compression ratio and amounts
- Memory Workload Analysis Report now shows Asynchronous Copy to shared memory
- Added linked hot spot tables in section bodies to indicate performance problems in the source code
- Added section navigation links in rule results to quickly jump to the referenced section
- Added a new option to select how kernel names are shown in the UI
- Added new memory tables for the L1/TEX cache and the L2 cache. The old tables are still available for backwards compatibility and moved to a new section containing deprecated UI elements.
- Memory tables now show the metric name as a tooltip
- Source resolution now takes into account file properties when selecting a file from disk
- Results in the profile report can now be filtered by NVTX range
- The Source page now supports collapsing views even for single files
- The UI shows profiler error messages as dismissible banners for increased visibility
- Improved the baseline name control in the profiler report header
- The UI command was renamed from nv-nsight-cu to ncu-ui. Old names remain for backwards compatibility.
NVIDIA Nsight Compute Command Line Interface
- The CLI command was renamed from nv-nsight-cu-cli to ncu. Old names remain for backwards compatibility.
- Queried metrics on GV100 and newer chips are sorted alphabetically
- Multiple instances of NVIDIA Nsight Compute CLI can now run concurrently on the same system, e.g. for profiling individual MPI ranks. Profiled kernels are serialized across all processes using a system-wide file lock.
- Resolved Issues
- More C++ kernel names can be properly demangled
- Fixed a free(): invalid pointer error when profiling applications using pytorch > 19.07
- Fixed profiling IBM Spectrum MPI applications that require PAMI GPU hooks (--smpiargs="-gpu")
- Fixed that the first kernel instruction was missed when computing sass__inst_executed_per_opcode
- Reduced surplus DRAM write traffic created from flushing caches during kernel replay
- The Compute Workload Analysis section shows the IMMA pipeline on GV11b GPUs
- Profile reports now scroll properly on MacOS when using a trackpad
- Relative output filenames for the Profile activity now use the document directory, instead of the current working directory
- Fixed path expansion of ~ on Windows
- Memory access information is now shown properly for RED assembly instructions on the Source page
- Fixed that user PYTHONHOME and PYTHONPATH environment variables would be picked up by NVIDIA Nsight Compute, resulting in locale encoding issues.
Drops and Deprecations
- Removed support for the Pascal SM 6.x GPU architecture
- Windows 7 is not a supported host or target platform anymore
For a complete overview of all NVIDIA Nsight™ Compute features and access to resources, please visit the main Nsight™ Compute page.
NVIDIA® Nsight™ Compute 2020.1 is available for download under the NVIDIA Registered Developer Program.