Discover How Tensor Cores Accelerate Your Mixed Precision Models
From intelligent assistants to autonomous robots and beyond, your deep learning models are addressing challenges that are rapidly growing in complexity. But converging these models has become increasingly difficult and often leads to underperforming and inefficient training cycles.
You don’t have to let those limitations slow your work. NVIDIA Volta and Turing GPUs powered by Tensor Cores give you an immediate path to faster training and greater deep learning performance. With Tensor Cores enabled, FP32 and FP16 mixed precision matrix multiply dramatically accelerates your throughput and reduces AI training times.
New To Tensor Cores?
See how Tensor Cores accelerate your AI training and deploymentFIND OUT MORE
NVIDIA GPUs with Tensor Cores enabled have already helped Fast.AI and AWS achieve impressive performance gains and powered NVIDIA to the top spots on MLPerf, the first industry-wide AI benchmark.
“Clova AI pursues advanced multi-modal platforms as a partnership between Korea’s top search engine NAVER, and Japan’s top messenger, LINE. Clova AI’s LaRva team focuses on language understandings in this platform to enable AI based services. “Using automatic mixed precision powered by NVIDIA Tensor Core GPUs increased throughput and enabled us to double our batch size for massive models like RoBERTa. With these optimizations we achieved a training speedup of 2x while still maintaining accuracy. We expect this improved technology can enhance our many NLP services including AI for Contact Center. That means significant cost savings in our model production and enhanced services for customers in shorter time.”
— Dongjun Lee and Sungdong Kim, Machine Learning Engineer, NAVER
“By using automatic mixed precision powered by NVIDIA Tensor Core GPUs, we were able to speed up our segmentation network training by a factor of 2. That means significant time saving and cost reduction for our customers, radiologists, and patients.”
— Yushuo Wenghuang, Technical Director, Infervision
“By using automatic mixed precision powered by NVIDIA Tensor Core GPUs, we were able to speed up our classification network training and model inference by a factor of 2.4 and 2.7 respectively. That means significant time saving and cost reduction for our customers, radiologists, and patients.”
— Hancheng Zheng, Director, iCarbonX
“Nuance Research advances and applies conversational AI technologies to power solutions that redefine how humans and computers interact. The rate of our advances reflects the speed at which we train and assess deep learning models. With Automatic Mixed Precision, we’ve realized a 50% speedup in TensorFlow-based ASR model training without loss of accuracy via a minimal code change. We’re eager to achieve a similar impact in our other deep learning language processing applications.”
— Wenxuan Teng, Senior Research Manager, Nuance Communications
“Automated mixed precision powered by NVIDIA Tensor Core GPUs on Alibaba allows us to instantly speedup AI models nearly 3X. Our researchers appreciated the ease of turning on this feature to instantly accelerate our AI.”
— Wei Lin, Sr Director, Alibaba Computing Platform
Learn How Mixed Precision On Tensor Cores Accelerate Your Models
Accelerated models speed your time to insight. With NVIDIA Tensor Cores, deep learning model throughput improved by up to 8X. Compared to FP32 alone, enabling Tensor Cores and using “mixed precision training” (performing matrix multiply in FP16 and accumulating the result in FP32 while maintaining accuracy), performance is dramatically improved by:
- Halving storage requirements (enables increased batch size on a fixed memory budget) with super-linear benefit.
- Generating half the memory traffic by reducing size of gradient and activation tensors.
Mixed Precision Training Techniques Using Tensor Cores For Deep Learning
Learn how mixed precision accelerates your modelsGET STARTED
Implementation to your Deep Learning workflows is seamless. NVIDIA provides out of the box models to get started immediately as well as tools to allow you to optimize your models for Tensor Cores.
Containers And Out-Of-The-Box Optimized Models Get You Running Quickly
You can try Tensor Cores in the cloud (any major CSP) or in your datacenter GPU. NVIDIA NGC is a comprehensive catalog of deep learning and scientific applications in easy-to-use software containers to get you started immediately.
Quickly experiment with tensor core optimized, out-of-the-box deep learning models from NVIDIA. These are easy-to-use and cover multiple use cases in MXNet, PyTorch and TensorFlow and allow you to easily train and test your datasets without additional development:
Get Tensor Core Optimized Examples
Application specific examples readily available for popular deep learning frameworksGET STARTED
Access Tensor Core Optimized Examples via NVIDIA NGC and GitHub:
Implement Tensor Cores To Easily Speedup Your Own Models
Realize faster performance on your own models with NVIDIA resources. Analyze your models with NVIDIA's profiler tools and optimize your Tensor Cores implementation with helpful documentation.
Analyze your model
NVIDIA NVProf is a profiler that can easily analyze your own model and optimize for mixed precision on Tensor CoresGET STARTED