CUDA 11 announced support for the new NVIDIA A100 based on the NVIDIA Ampere architecture. Today CUDA 11.1 introduces support for NVIDIA GeForce RTX 30 Series and Quadro RTX Series GPU platforms.
CUDA is the most powerful software development platform for building GPU-accelerated applications, providing all the components needed to develop applications targeting every GPU platform.
In addition to new platform support, CUDA 11.1 introduces unique capabilities to enable CUDA programs to take advantage of hardware accelerated asynchronous copy from global-to-shared memory in a single operation to reduce register file bandwidth and improve kernel occupancy. You can also increase efficiency by overlapping thread execution while waiting on synchronization barriers (e.g. while doing asynchronous copies or other work). We describe these innovative capabilities in a new blog below.
The recently released CUDA 11.1 enables support for a broad base of gaming and graphics developers leveraging new Ampere technology advances such as RT Cores, Tensor Cores, and streaming multiprocessors for the most realistic ray-traced graphics and cutting-edge AI features. CUDA 11.1 also introduces library optimizations, and CUDA graph enhancements, as well as updates to OS and host compiler support.
For additional insights on CUDA for this these platforms, check out our blogs and on-demand GTC sessions below:
NEW Blog – Controlling Data Movement to Boost Performance on the NVIDIA Ampere Architecture – Delivers some specifics on advances in the areas of asynchronous data movement, and a better journey through the memory hierarchy.
CUDA 11 Features Revealed – Provides a broad overview of software capabilities surrounding the latest Ampere GPU
GTC On-Demand Session: CUDA New Features and Beyond
GTC On-Demand Session: CUDA on NVIDIA Ampere GPU Architecture