Posts by Paulius Micikevicius
Technical Walkthrough
Jan 27, 2021
Accelerating AI Training with NVIDIA TF32 Tensor Cores
NVIDIA Ampere GPU architecture introduced the third generation of Tensor Cores, with the new TensorFloat32 (TF32) mode for accelerating FP32 convolutions and...
10 MIN READ
Technical Walkthrough
Jun 10, 2019
Tips for Optimizing GPU Performance Using Tensor Cores
Our most popular question is "What can I do to get great GPU performance for deep learning?" We’ve recently published a detailed Deep Learning Performance...
13 MIN READ
Technical Walkthrough
Oct 11, 2017
Mixed-Precision Training of Deep Neural Networks
Deep Neural Networks (DNNs) have lead to breakthroughs in a number of areas, including image processing and understanding, language modeling, language...
9 MIN READ