Discover How Tensor Cores Accelerate Your Mixed Precision Models

From intelligent assistants to autonomous robots and beyond, your deep learning models are addressing challenges that are rapidly growing in complexity. But converging these models has become increasingly difficult and often leads to underperforming and inefficient training cycles.

You don’t have to let those limitations slow your work. NVIDIA Volta and Turing GPUs powered by Tensor Cores give you an immediate path to faster training and greater deep learning performance. With Tensor Cores enabled, FP32 and FP16 mixed precision matrix multiply dramatically accelerates your throughput and reduces AI training times.

New To Tensor Cores?

See how Tensor Cores accelerate your AI training and deployment


NVIDIA GPUs with Tensor Cores enabled have already helped Fast.AI and AWS achieve impressive performance gains and powered NVIDIA to the top spots on MLPerf, the first industry-wide AI benchmark.

Customer Success Stories

Fast.AI achieved a 40% increase in the ImageNet record by leveraging Tensor Cores

Learn More

AWS recommends Tensor Cores for the most complex deep learning models and scientific applications

Learn More

Performance Benchmarks

NVIDIA Captures Top Spots on MLPerf - World’s First Industry-Wide AI Benchmark by Leveraging Tensor Cores

Learn More >

See NVIDIA AI product performance across multiple frameworks, models and GPUs

Learn More

High Performance Computing

NVIDIA Tensor Core GPUs Power 5 of 6 Gordon Bell Finalists in Scientific Applications

Learn More

Using Mixed Precision for FP64 Scientific Computing

Learn More

Learn How Mixed Precision On Tensor Cores Accelerate Your Models

Accelerated models speed your time to insight. With NVIDIA Tensor Cores, deep learning model throughput improved by up to 8X. Compared to FP32 alone, enabling Tensor Cores and using “mixed precision training” (performing matrix multiply in FP16 and accumulating the result in FP32 while maintaining accuracy), performance is dramatically improved by:

  • Halving storage requirements (enables increased batch size on a fixed memory budget) with super-linear benefit.
  • Generating half the memory traffic by reducing size of gradient and activation tensors.

Mixed Precision Training Techniques Using Tensor Cores For Deep Learning

Learn how mixed precision accelerates your models


Implementation to your Deep Learning workflows is seamless. NVIDIA provides out of the box models to get started immediately as well as tools to allow you to optimize your models for Tensor Cores.

Customer Implementation Whitepapers

Facebook Scaling NMT 5X with Mixed Precision (Arxiv Sep 2018)

Learn More

Baidu Research and NVIDIA on Mixed Precision Training (ICLR 2018)

Learn More

Developer Blogs

Open Source Software Optimizations for Mixed Precision Training on Tensor Cores

Learn More

Automatic Mixed Precision for auto enabling of Tensor Cores in PyTorch

Learn More

Automatic Mixed Precision for auto enabling of Tensor Cores in TensorFlow

Learn More

Developer Webinars

Webinar: Real World Examples Training Neural Networks with Mixed Precision

Learn More >

Webinar: Automatic Mixed Precision (AMP) – easily enable mixed precision in your model with 2 lines of code

Learn More

Containers And Out-Of-The-Box Optimized Models Get You Running Quickly

You can try Tensor Cores in the cloud (any major CSP) or in your datacenter GPU. NVIDIA NGC is a comprehensive catalog of deep learning and scientific applications in easy-to-use software containers to get you started immediately.

Quickly experiment with tensor core optimized, out-of-the-box deep learning models from NVIDIA. These are easy-to-use and cover multiple use cases in MXNet, PyTorch and TensorFlow and allow you to easily train and test your datasets without additional development:

Get Tensor Core Optimized Examples

Application specific examples readily available for popular deep learning frameworks


Access Tensor Core Optimized Examples via NVIDIA NGC and GitHub:

Get NVIDIA NGC Containers
(Pre-Packaged Examples)

PyTorch>    TensorFlow>   MXNet>

Get NVIDIA NGC Model Scripts
(Choose Examples)


GitHub Repository


Implement Tensor Cores To Easily Speedup Your Own Models

Realize faster performance on your own models with NVIDIA resources. Analyze your models with NVIDIA's profiler tools and optimize your Tensor Cores implementation with helpful documentation.

Analyze your model

NVIDIA NVProf is a profiler that can easily analyze your own model and optimize for mixed precision on Tensor Cores



Enabling Automatic Mixed Precision in MXNet

Learn More


Enabling Automatic Mixed Precision in PyTorch

Learn More

Webinar: Automatic Mixed Precision – easily enable mixed precision in your model with 2 lines of code

Learn More

DevBlog: Tools For Easy Mixed Precision Training in PyTorch

Learn More


Enabling Automatic Mixed Precision in TensorFlow

Learn More

Tutorial: TensorFlow ResNet-50 with Mixed-Precision

Learn More

TensorFlow framework for sequence models: Openseq2seq

Learn More


SDK:Mixed-precision best practices

Learn More