Simulation / Modeling / Design

CUDA 11.6 Toolkit New Release Revealed

Jan 17, 2022

By Rob Armstrong, Arthy Sundaram and Fred Oh

Discuss (0)

AI-Generated Summary

Dislike

The CUDA 11.6 release enhances the programming model and performance of CUDA applications, particularly in areas like HPC, visualization, AI, and data science.
CUDA 11.6 introduces the GSP driver architecture as the default for Turing and Ampere GPUs, and includes a new API to disable nodes in instantiated graphs.
The release also includes full support for 128-bit integer type, updates to the cooperative groups namespace, and improvements to the CUDA compiler, including a new -arch=native compilation option.

AI-generated content may summarize information incompletely. Verify important information. Learn more

NVIDIA announces the newest release of the CUDA development environment, CUDA 11.6. This release is focused on enhancing the programming model and performance of your CUDA applications. CUDA continues to push the boundaries of GPU acceleration and lay the foundation for new applications in HPC, visualization, AI, ML and DL, and data science.

CUDA 11.6 has several important features. This post offers an overview of the key capabilities:

GSP driver architecture now default on Turing and Ampere GPUs
New API to allow disabling nodes in instantiated graph
Full support of 128-bit integer type
Cooperative groups namespace update
CUDA compiler update
Nsight Compute 2022.1 release

CUDA 11.6 ships with the R510 driver, an update branch. CUDA 11.6 Toolkit is available to download.

GSP driver architecture

The GSP driver architecture is now the default driver mode for all listed Turing and Ampere GPUs. The older driver architecture is supported as a fallback. For more information, see R510 Driver Readme.

Instantiated Graph Node API additions

We added a new API, cudaGraphNodeSetEnabled, to allow disabling nodes in an instantiated graph. Support is limited to kernel nodes in this release. A corresponding API, cudaGraphNodeGetEnabled, allows querying the enabled state of a node. We’ve also added the ability to disable NULL kernel graph node launches.

128-bit integer support

CUDA 11.6 includes the full release of 128-bit integer (__int128) data type, including compiler and developer tools support. The host-side compiler must support the __int128 type to use this feature.

Cooperative groups namespace

The cooperative groups namespace has been updated with new functions to improve consistency in naming, function scope, and unit dimension and size.

Implicit Group/Member	Threads	Blocks
`thread_block::`	`dim_threads` `num_threads` `thread_rank` `thread_index`	(Not needed)
`grid_group::`	`num_threads` `thread_rank`	`dim_blocks` `num_blocks` `block_rank` `block_index`

Table 1. New functions in cooperative groups namespace

CUDA compiler

Added -arch=native compilation option to target installed GPUs during compilation. This extends the existing -gencode=arch=compute_xx,code=sm_xx architecture specification
Add the ability to create PTX files from nvlink

Deprecated features

The cudaDeviceSynchronize() used for on-device fork and join parallelism is deprecated in preparation for a replacement programming model with higher performance. These functions continue to work in this release, but the tools emit a warning about the upcoming change.
CentOS Linux 8 has reached End-of-Life on Dec 31, 2021, and support for this OS is now deprecated in the CUDA Toolkit. CentOS Linux 8 support will be completely removed in a future release.

Additional resources

GTC sessions:
- CUDA New Features and Beyond, by Stephen Jones
- Nearly Effortless CUDA Graphs, by Rob Van der Wijngaart and Jiajie Yao
- A Deep Dive Into the Latest HPC Software, by Tim Costa
- Multi-GPU Programming Models, by Jiri Kraus
Blog posts:
- Revealing New Features in the CUDA 11.5 Toolkit
- Reducing Application Build Times Using C++ Compilation Aids
- Employing CUDA Graphs in a Dynamic Environment, by Rob Van der Wijngaart
- A Complete Overview of Nsight Developer Tools, by Chaitrali Joshi

Discuss (0)

About the Authors

About Rob Armstrong
Rob Armstrong is a principal technical product manager for the CUDA toolkit. For over 20 years he has focused on accelerating software with heterogeneous hardware platforms, and has particular interest in computer architecture and hardware/software interaction.

View all posts by Rob Armstrong

About Arthy Sundaram
Arthy is senior product manager for NVIDIA CUDA Math Libraries. Prior to this, Arthy has served as senior product manager for NVIDIA CUDA C++ Compiler and also the enablement of CUDA on WSL and ARM. She joined NVIDIA in 2014 as a senior engineer in the GPU driver team and worked extensively on Maxwell, Pascal and Turing architectures. In 2020, she transitioned to product management where she's focused on solving customer-centric problems. She has a bachelor's and master's degree in the field of computer science.

View all posts by Arthy Sundaram

About Fred Oh
Fred is a senior product marketing manager for CUDA, CUDA on WSL, and CUDA Python. Fred has a B.S. in Computer Science and Math from UC Davis. He began his career as a UNIX software engineer porting kernel services and device drivers to x86 architectures. He loves Star Wars, Star Trek and the NBA Warriors.

View all posts by Fred Oh