Automatic Mixed Precision for Deep Learning
Deep Neural Network training has traditionally relied on IEEE single-precision format, however with mixed precision, you can train with half precision while maintaining the network accuracy achieved with single precision. This technique of using both single- and half-precision representations is referred to as mixed precision technique.
Benefits of Mixed precision training
Nuance Research advances and applies conversational AI technologies to power solutions that redefine how humans and computers interact. The rate of our advances reflects the speed at which we train and assess deep learning models. With Automatic Mixed Precision, we’ve realized a 50% speedup in TensorFlow-based ASR model training without loss of accuracy via a minimal code change. We’re eager to achieve a similar impact in our other deep learning language processing applications.Wenxuan Teng, Senior Research Manager, Nuance Communications
Enabling mixed precision involves two steps: porting the model to use the half-precision data type where appropriate, and using loss scaling to preserve small gradient values. Deep learning researchers and engineers can easily get started enabling this feature on Ampere, Volta and Turing GPUs.
On Ampere GPUs, automatic mixed precision uses FP16 to deliver a performance boost of 3X versus TF32, the new format which is already ~6x faster than FP32. On Volta and Turing GPUs, automatic mixed precision delivers up to 3X higher performance vs FP32 with just a few lines of code. The best training performance on NVIDIA GPUs is always available on the NVIDIA deep learning performance page.
Using Automatic Mixed Precision for Major Deep Learning Frameworks
TensorFlowAutomatic Mixed Precision is available both in native TensorFlow and inside the TensorFlow container on NVIDIA NGC container registry. To enable AMP in NGC TensorFlow 19.07 or upstream TensorFlow 1.14 or later, wrap your
tf.keras.optimizersOptimizer as follows:
opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)
This change applies automatic loss scaling to your model and enables automatic casting to half precision.
“Automated mixed precision powered by NVIDIA Tensor Core GPUs on Alibaba allows us to instantly speedup AI models nearly 3X. Our researchers appreciated the ease of turning on this feature to instantly accelerate our AI.”
— Wei Lin，Senior Director at Alibaba Computing Platform, Alibaba
Automatic Mixed Precision feature is available in the Apex repository on GitHub. To enable, add these two lines of code into your existing training script:
scaler = GradScaler()
output = model(input)
loss = loss_fn(output, target)
Automatic Mixed Precision feature is available both in native MXNet (1.5 or later) and inside the MXNet container (19.04 or later) on NVIDIA NGC container registry. To enable the feature, add the following lines of code to your existing training script:
with amp.scale_loss(loss, trainer) as scaled_loss:
Automatic Mixed Precision feature is available in PaddlePaddle on GitHub. To enable, add these two lines of code into your existing training script:
sgd = SGDOptimizer()
mp_sgd = fluid.contrib.mixed_precision.decorator.decorate(sgd)
- Webinar: Mixed-Precision Training of Neural Networks
- AI’s Latest Precision Format TensorFloat-32 Delivers Dramatic Out-of-the-box Performance While Preserving Accuracy
- Learn more: Tensor Cores for developers
- NVIDIA Ampere Architecture In-Depth
- What’s the Difference Between Single-, Double-, Multi- and Mixed-Precision Computing?
- Tensor Core Optimized Examples
- Developer Blog: Automatic Mixed Precision for NVIDIA Tensor Core Architecture in TensorFlow