Technical Walkthrough 0

Object Detection on GPUs in 10 Minutes

Object detection remains the primary driver for applications such as autonomous driving and intelligent video analytics. Object detection applications require... 21 MIN READ
Technical Walkthrough 2

Tips for Optimizing GPU Performance Using Tensor Cores

Our most popular question is "What can I do to get great GPU performance for deep learning?" We’ve recently published a detailed Deep Learning Performance... 13 MIN READ
Technical Walkthrough 0

Machine Learning Acceleration in Vulkan with Cooperative Matrices

Machine learning harnesses computing power to solve a variety of ‘hard’ problems that seemed impossible to program using traditional languages and... 8 MIN READ
Technical Walkthrough 0

Video Series: Mixed-Precision Training Techniques Using Tensor Cores for Deep Learning

Neural networks with thousands of layers and millions of neurons demand high performance and faster training times. The complexity and size of neural networks... 5 MIN READ
Technical Walkthrough 0

Using Tensor Cores for Mixed-Precision Scientific Computing

Double-precision floating point (FP64) has been the de facto standard for doing scientific simulation for several decades. Most numerical methods used in... 9 MIN READ
Technical Walkthrough 0

Mixed Precision Training for NLP and Speech Recognition with OpenSeq2Seq

The success of neural networks thus far has been built on bigger datasets, better theoretical models, and reduced training time. Sequential models, in... 11 MIN READ