Tag: NVIDIA Ampere

AI / Deep Learning

Using Tensor Cores in CUDA Fortran

Tensor Cores, which are programmable matrix multiply and accumulate units, were first introduced in the V100 GPUs where they operated on half-precision (16-bit)… 28 MIN READ
AI / Deep Learning

Accelerating Matrix Multiplication with Block Sparse Format and NVIDIA Tensor Cores

Sparse-matrix dense-matrix multiplication (SpMM) is a fundamental linear algebra operation and a building block for more complex algorithms such as finding the… 7 MIN READ
AI / Deep Learning

Accelerating AI Training with NVIDIA TF32 Tensor Cores

NVIDIA Ampere GPU architecture introduced the third generation of Tensor Cores, with the new TensorFloat32 (TF32) mode for accelerating FP32 convolutions and… 10 MIN READ

Enhancing Memory Allocation with New NVIDIA CUDA 11.2 Features

CUDA is the software development platform for building GPU-accelerated applications, providing all the components needed to develop applications targeting every… 9 MIN READ

Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt

Deep neural networks achieve outstanding performance in a variety of fields, such as computer vision, speech recognition, and natural language processing. 9 MIN READ

Controlling Data Movement to Boost Performance on the NVIDIA Ampere Architecture

The NVIDIA Ampere architecture provides new mechanisms to control data movement within the GPU and CUDA 11.1 puts those controls into your hands. 8 MIN READ