FP16

Apr 27, 2023
End-to-End AI for NVIDIA-Based PCs: Optimizing AI by Transitioning from FP32 to FP16
This post is part of a series about optimizing end-to-end AI. The performance of AI models is heavily influenced by the precision of the computational resources...
4 MIN READ

Jun 26, 2019
Object Detection on GPUs in 10 Minutes
Object detection remains the primary driver for applications such as autonomous driving and intelligent video analytics. Object detection applications require...
21 MIN READ

Jun 10, 2019
Tips for Optimizing GPU Performance Using Tensor Cores
Our most popular question is "What can I do to get great GPU performance for deep learning?" We’ve recently published a detailed Deep Learning Performance...
13 MIN READ

Apr 16, 2019
Machine Learning Acceleration in Vulkan with Cooperative Matrices
Machine learning harnesses computing power to solve a variety of ‘hard’ problems that seemed impossible to program using traditional languages and...
8 MIN READ

Jan 30, 2019
Video Series: Mixed-Precision Training Techniques Using Tensor Cores for Deep Learning
Neural networks with thousands of layers and millions of neurons demand high performance and faster training times. The complexity and size of neural networks...
5 MIN READ

Jan 23, 2019
Using Tensor Cores for Mixed-Precision Scientific Computing
Double-precision floating point (FP64) has been the de facto standard for doing scientific simulation for several decades. Most numerical methods used in...
9 MIN READ

Oct 09, 2018
Mixed Precision Training for NLP and Speech Recognition with OpenSeq2Seq
The success of neural networks thus far has been built on bigger datasets, better theoretical models, and reduced training time. Sequential models, in...
11 MIN READ

Aug 20, 2018
Tensor Ops Made Easier in cuDNN
Neural network models have quickly taken advantage of NVIDIA Tensor Cores for deep learning since their introduction in the Tesla V100 GPU last year. For...
6 MIN READ

Oct 17, 2017
Programming Tensor Cores in CUDA 9
A defining feature of the new NVIDIA Volta GPU architecture is Tensor Cores, which give the NVIDIA V100 accelerator a peak throughput that is 12x...
16 MIN READ

Oct 11, 2017
Mixed-Precision Training of Deep Neural Networks
Deep Neural Networks (DNNs) have lead to breakthroughs in a number of areas, including image processing and understanding, language modeling, language...
9 MIN READ

Oct 19, 2016
Mixed-Precision Programming with CUDA 8
Update, March 25, 2019: The latest Volta and Turing GPUs now incoporate Tensor Cores, which accelerate certain types of FP16 matrix math. This enables faster...
17 MIN READ