Accelerate Applications on NVIDIA Ampere

Researchers, scientists, and developers are focused on solving the world’s most important scientific computing and big data challenges. The NVIDIA® Ampere Architecture, along with 150+ SDKs and libraries, deliver the next big leap in high-performance computing (HPC) and Artificial intelligence (AI) -- while providing unmatched acceleration at every scale.

Blog Webinar Documentation Performance

Key Innovations

The NVIDIA Ampere architecture adds several key innovations, including Multi-Instance GPU (MIG), third-generation Tensor Cores with TF32, third-generation NVIDIA® NVLink®, second-generation RT Cores, and structural sparsity. To leverage these innovations, thousands of GPU-accelerated applications are built on the NVIDIA CUDA parallel computing platform. The flexibility and programmability of CUDA have made it the platform of choice for researching and deploying new DL and parallel computing algorithms.

Multi-Instance GPU

With Multi-Instance GPU (MIG), developers will be able to see and schedule jobs on virtual GPU Instances as if they were physical GPUs. MIG supports running CUDA applications in containers or on bare-metal. MIG works with Linux operating systems, supports containers using Docker Engine, with support for Kubernetes and virtual machines using Red Hat Virtualization and VMware vSphere hypervisors.

Learn more about MIG

Third-Generation Tensor Cores

Developers can leverage Tensor Cores via multiple APIs and SDKs. The WMMA (Warp Matrix Multiply and Accumulate) operations in CUDA provide the most direct access while specialized libraries like cuDNN, RAPIDS, TensorRT and DLSS take advantage of Tensor Cores to accelerate AI training and inference.

Analyze your models with the NVIDIA Nsight suite of profiler and debugger tools; and optimize your Tensor Cores implementation with helpful learning resources.

Learn more about Tensor Cores

Third-Generation NVLink

NCCL and MagnumIO provide API/layers to help developers leverage high-speed interprocess communication through NVLink. The cuFFT and cuBLAS libraries take advantage of NVLink for better multi-GPU scaling including problems where communication is a significant bottleneck today. The combination of Unified Memory and NVLink enables faster, easier data sharing between CPU and GPU code.

Learn more about NVLink

Second-Generation RT Cores

Graphics APIs such as DirectX 12 Ultimate, Vulkan, and Optix can be used to take advantage of RT Cores to accelerate ray tracing workloads.

Higher level SDKs like RTX Direct Illumination ( RTXDI) and RTX Global Illumination (RTXGI) help developers add millions of dynamic lights and compute multi-bounce indirect lighting without worrying about resource constraints or expensive per-frame costs.

The NVIDIA Omniverse Platform is ideal for developers who have adopted USD and want to take advantage of collaboration with simulation and real-time ray tracing.

Learn more about RT Cores

Structural Sparsity

The cuSPARSE library provides GPU-accelerated basic linear algebra subroutines for sparse matrices that perform significantly faster than CPU-only alternatives. It provides functionality that can be used to build GPU accelerated solvers without sacrificing accuracy of the matrix multiply-accumulate jobs at the heart of Deep Learning.

Learn more about Structural Sparsity

NVIDIA Ampere Architecture Resources

Technical Blogs about the Architecture

Read Blogs

Tuning CUDA® Applications for the Ampere Architecture

View Documentation

Videos about the Architecture

Watch Now