Accelerate Applications on NVIDIA Ampere
Researchers, scientists, and developers are focused on solving the world’s most important scientific computing and big data challenges. The NVIDIA® Ampere Architecture, along with 150+ SDKs and libraries, deliver the next big leap in high-performance computing (HPC) and Artificial intelligence (AI) -- while providing unmatched acceleration at every scale.
The NVIDIA Ampere architecture adds several key innovations, including Multi-Instance GPU (MIG), third-generation Tensor Cores with TF32, third-generation NVIDIA® NVLink®, second-generation RT Cores, and structural sparsity. To leverage these innovations, thousands of GPU-accelerated applications are built on the NVIDIA CUDA parallel computing platform. The flexibility and programmability of CUDA have made it the platform of choice for researching and deploying new DL and parallel computing algorithms.
With Multi-Instance GPU (MIG), developers will be able to see and schedule jobs on virtual GPU Instances as if they were physical GPUs. MIG supports running CUDA applications in containers or on bare-metal. MIG works with Linux operating systems, supports containers using Docker Engine, with support for Kubernetes and virtual machines using Red Hat Virtualization and VMware vSphere hypervisors.Learn more about MIG
Third-Generation Tensor Cores
Developers can leverage Tensor Cores via multiple APIs and SDKs. The WMMA (Warp Matrix Multiply and Accumulate) operations in CUDA provide the most direct access while specialized libraries like cuDNN, RAPIDS, TensorRT and DLSS take advantage of Tensor Cores to accelerate AI training and inference.
Analyze your models with the NVIDIA Nsight suite of profiler and debugger tools; and optimize your Tensor Cores implementation with helpful learning resources.Learn more about Tensor Cores
NCCL and MagnumIO provide API/layers to help developers leverage high-speed interprocess communication through NVLink. The cuFFT and cuBLAS libraries take advantage of NVLink for better multi-GPU scaling including problems where communication is a significant bottleneck today. The combination of Unified Memory and NVLink enables faster, easier data sharing between CPU and GPU code.Learn more about NVLink
Second-Generation RT Cores
Higher level SDKs like RTX Direct Illumination ( RTXDI) and RTX Global Illumination (RTXGI) help developers add millions of dynamic lights and compute multi-bounce indirect lighting without worrying about resource constraints or expensive per-frame costs.
Learn more about RT Cores
The cuSPARSE library provides GPU-accelerated basic linear algebra subroutines for sparse matrices that perform significantly faster than CPU-only alternatives. It provides functionality that can be used to build GPU accelerated solvers without sacrificing accuracy of the matrix multiply-accumulate jobs at the heart of Deep Learning.Learn more about Structural Sparsity