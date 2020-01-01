Nsight Compute 2020.1 - New Features
Roofline Analysis comparing profiling runs of code optimized to near the GPU's full potential (red) and the baseline (purple).
NVIDIA® Nsight™ Compute 2020.1 Videos and Blogs
- 2020.1 Spotlight and GTC 2020 videos
- Unleashing the Power of NVIDIA Ampere Architecture with NVIDIA Nsight Developer Tools blog
- GTC 2020 Lab Video and Materials
Updates in 2020.1.2:Found in CUDA Toolkit 11.0 Update 1
General
Resolved Issues
NVIDIA® Nsight™ Compute 2020.1.1 has been released with the following update:Found in CUDA Toolkit 11.0 GA
General
- Metrics passed to --metrics on the NVIDIA Nsight Compute CLI or in the respective Profile activity option are automatically expanded to all first-level sub-metrics if required. See the documentation on --metrics for more details.
- Added new rules for detecting inefficiencies of using the Compute Data Compression on the NVIDIA Ampere architecture.
- The version of the NVIDIA Nsight Compute target collecting the results is shown in the Session page.
- Added new launch__grid_dim_[x,y,z] and launch__block_dim_[x,y,z] metrics.
NVIDIA Nsight Compute
- The Break on API Error functionality has been improved when auto profiling.
NVIDIA Nsight Compute Command Line Interface
- The full path to the report output file is printed after profiling.
- Added and corrected metrics in the nvprof Metric Comparison table.
Resolved Issues
- Documented the breakdown: metrics prefix.
- Fixed handling of escaped domain delimiters in NVTX filter expressions.
- Fixed issues with the occupancy charts for small block sizes.
- Fixed an issue when choosing a default report page in the options dialog.
- Fixed that the scroll bar could overlap the content when exporting the report page as an image.
NVIDIA® Nsight™ Compute 2020.1.0 has been released with the following features and improvements:Found in CUDA Toolkit 11.0 RC
General
- Added support for the NVIDIA GA100/SM 8.x GPU architecture
- Expanded platform support to include Arm SBSA (server based system architectures)
- Support for CUDA Toolkit 11.0 was added
- Added a rule for reporting uncoalesced memory accesses as part of the Source Counters section
- Added support for report name placeholders %p, %q, %i and %h
- The Kernel Profiling Guide was added to the documentation
- The Special Configurations sections was added to the documentation, detailing support for NVIDIA Ampere architecture's Multi-Instance GPU (MIG)
- Added support for Visual Studio integration (windows only)
NVIDIA Nsight Compute
- Added support for roofline analysis charts
- NVIDIA Ampere architecture enhancements
- Memory Workload Analysis Report now shows Compute Data Compression ratio and amounts
- Memory Workload Analysis Report now shows Asynchronous Copy to shared memory
- Added linked hot spot tables in section bodies to indicate performance problems in the source code
- Added section navigation links in rule results to quickly jump to the referenced section
- Added a new option to select how kernel names are shown in the UI
- Added new memory tables for the L1/TEX cache and the L2 cache. The old tables are still available for backwards compatibility and moved to a new section containing deprecated UI elements.
- Memory tables now show the metric name as a tooltip
- Source resolution now takes into account file properties when selecting a file from disk
- Results in the profile report can now be filtered by NVTX range
- The Source page now supports collapsing views even for single files
- The UI shows profiler error messages as dismissible banners for increased visibility
- Improved the baseline name control in the profiler report header
- The UI command was renamed from nv-nsight-cu to ncu-ui. Old names remain for backwards compatibility.
NVIDIA Nsight Compute Command Line Interface
- The CLI command was renamed from nv-nsight-cu-cli to ncu. Old names remain for backwards compatibility.
- Queried metrics on GV100 and newer chips are sorted alphabetically
- Multiple instances of NVIDIA Nsight Compute CLI can now run concurrently on the same system, e.g. for profiling individual MPI ranks. Profiled kernels are serialized across all processes using a system-wide file lock.
- Resolved Issues
- More C++ kernel names can be properly demangled
- Fixed a free(): invalid pointer error when profiling applications using pytorch > 19.07
- Fixed profiling IBM Spectrum MPI applications that require PAMI GPU hooks (--smpiargs="-gpu")
- Fixed that the first kernel instruction was missed when computing sass__inst_executed_per_opcode
- Reduced surplus DRAM write traffic created from flushing caches during kernel replay
- The Compute Workload Analysis section shows the IMMA pipeline on GV11b GPUs
- Profile reports now scroll properly on MacOS when using a trackpad
- Relative output filenames for the Profile activity now use the document directory, instead of the current working directory
- Fixed path expansion of ~ on Windows
- Memory access information is now shown properly for RED assembly instructions on the Source page
- Fixed that user PYTHONHOME and PYTHONPATH environment variables would be picked up by NVIDIA Nsight Compute, resulting in locale encoding issues.
Drops and Deprecations
- Removed support for the Pascal SM 6.x GPU architecture
- Windows 7 is not a supported host or target platform anymore
