Tag: NVIDIA Ampere


Discovering New Features in CUDA 11.4

This post shares an overview of the key capabilities released in CUDA 11.4. 14 MIN READ
AI / Deep Learning

Using Tensor Cores in CUDA Fortran

This blog describes a CUDA Fortran interface to this same functionality, focusing on the third-generation Tensor Cores of the Ampere architecture. 28 MIN READ
AI / Deep Learning

Accelerating Matrix Multiplication with Block Sparse Format and NVIDIA Tensor Cores

Sparse-matrix dense-matrix multiplication (SpMM) is a fundamental linear algebra operation and a building block for more complex algorithms such as finding the… 7 MIN READ
AI / Deep Learning

Accelerating AI Training with NVIDIA TF32 Tensor Cores

NVIDIA Ampere GPU architecture introduced the third generation of Tensor Cores, with the new TensorFloat32 (TF32) mode for accelerating FP32 convolutions and… 10 MIN READ

Enhancing Memory Allocation with New NVIDIA CUDA 11.2 Features

CUDA is the software development platform for building GPU-accelerated applications, providing all the components needed to develop applications targeting every… 9 MIN READ

Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt

Deep neural networks achieve outstanding performance in a variety of fields, such as computer vision, speech recognition, and natural language processing. 9 MIN READ