Programming Tensor Cores in CUDA 9

Features, Deep Learning, Linear Algebra, Mixed Precision, Tensor Cores, Volta

Nadeem Mohammad, posted Oct 17 2017

A defining feature of the new Volta GPU Architecture is its Tensor Cores, which give the Tesla V100 accelerator a peak throughput 12 times the 32-bit floating point throughput of the previous-generation Tesla P100.

Read more

Register Cache: Caching for Warp-Centric CUDA Programs

Features, Cooperative Groups, CUDA, Optimization

Nadeem Mohammad, posted Oct 12 2017

In this post we introduce the “register cache”, an optimization technique that develops a virtual caching layer for threads in a single warp. It is a software abstraction implemented on top of the NVIDIA GPU shuffle primitive.

Read more

Mixed-Precision Training of Deep Neural Networks

Artificial Intelligence, Machine Learning & Artificial Intelligence, Tesla

Nadeem Mohammad, posted Oct 12 2017

Deep Neural Networks (DNNs) have lead to breakthroughs in a number of areas, including image processing and understanding, language modeling, language translation, speech processing, game playing, and many others.

Read more

Linux Graphics Debugger 2.2 released with enhanced performance analysis and frame capture serialization

Linux Graphics Debugger, GameWorks

Robert Bischof, posted Oct 11 2017

Linux Graphics Debugger 2.2 is available for download under the NVIDIA GameWorks Registered Developer Program.

Read more

Mixed-Precision Training of Deep Neural Networks

Features, Deep Learning, FP16, Mixed Precision, Tensor Cores, Volta

Nadeem Mohammad, posted Oct 11 2017

Deep Neural Networks (DNNs) have lead to breakthroughs in a number of areas, including image processing and understanding, language modeling, language translation, speech processing, game playing, and many others.

Read more