Automatic Mixed Precision for Deep Learning

Deep Neural Network training has traditionally relied on IEEE single-precision format, however with mixed precision, you can train with half precision while maintaining the network accuracy achieved with single precision. This technique of using both single- and half-precision representations is referred to as mixed precision technique.

Benefits of Mixed precision training

  • Speeds up math-intensive operations, such as linear and convolution layers, by using Tensor Cores.
  • Speeds up memory-limited operations by accessing half the bytes compared to single-precision.
  • Reduces memory requirements for training models, enabling larger models or larger minibatches.

  • Nuance Research advances and applies conversational AI technologies to power solutions that redefine how humans and computers interact. The rate of our advances reflects the speed at which we train and assess deep learning models. With Automatic Mixed Precision, we’ve realized a 50% speedup in TensorFlow-based ASR model training without loss of accuracy via a minimal code change. We’re eager to achieve a similar impact in our other deep learning language processing applications.

    Wenxuan Teng, Senior Research Manager, Nuance Communications

    Enabling mixed precision involves two steps: porting the model to use the half-precision data type where appropriate, and using loss scaling to preserve small gradient values. Deep learning researchers and engineers can easily get started enabling this feature on Ampere, Volta and Turing GPUs.

    On Ampere GPUs, automatic mixed precision uses FP16 to deliver a performance boost of 3X versus TF32, the new format which is already ~6x faster than FP32. On Volta and Turing GPUs, automatic mixed precision delivers up to 3X higher performance vs FP32 with just a few lines of code. The best training performance on NVIDIA GPUs is always available on the NVIDIA deep learning performance page.

    amp

    Using Automatic Mixed Precision for Major Deep Learning Frameworks

    TensorFlow

    Automatic Mixed Precision is available both in native TensorFlow and inside the TensorFlow container on NVIDIA NGC container registry. To enable AMP in NGC TensorFlow 19.07 or upstream TensorFlow 1.14 or later, wrap your tf.train or tf.keras.optimizers Optimizer as follows: opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)

    This change applies automatic loss scaling to your model and enables automatic casting to half precision.


    “Automated mixed precision powered by NVIDIA Tensor Core GPUs on Alibaba allows us to instantly speedup AI models nearly 3X. Our researchers appreciated the ease of turning on this feature to instantly accelerate our AI.”

    — Wei Lin,Senior Director at Alibaba Computing Platform, Alibaba


    PyTorch

    Automatic Mixed Precision feature is available in the Apex repository on GitHub. To enable, add these two lines of code into your existing training script:

    scaler = GradScaler()

    with autocast():
        output = model(input)
        loss = loss_fn(output, target)

    scaler.scale(loss).backward()

    scaler.step(optimizer)

    scaler.update()

    MXNet

    Automatic Mixed Precision feature is available both in native MXNet (1.5 or later) and inside the MXNet container (19.04 or later) on NVIDIA NGC container registry. To enable the feature, add the following lines of code to your existing training script:

    amp.init()
    amp.init_trainer(trainer)
    with amp.scale_loss(loss, trainer) as scaled_loss:
       autograd.backward(scaled_loss)

    PaddlePaddle

    Automatic Mixed Precision feature is available in PaddlePaddle on GitHub. To enable, add these two lines of code into your existing training script:

    sgd = SGDOptimizer()
    mp_sgd = fluid.contrib.mixed_precision.decorator.decorate(sgd)
    mp_sgd.minimize(loss)

    Additional Resources