GTC Silicon Valley-2019: Automated Mixed-Precision Tools for TensorFlow Training
GTC Silicon Valley-2019 ID:S91029:Automated Mixed-Precision Tools for TensorFlow Training
Nathan Luehr(NVIDIA),Reed WandermanMilne(Google)
In order to obtain peak performance and energy efficiency on modern deep learning architectures, such as GPUs and TPUs, it is critical to use half precision arithmetic. Compared to single precision, half precision reduces memory traffic, allowing 2x better use of the available DRAM bandwidth. Smaller memory footprints for half precision layer activations also allow larger batch sizes and deeper network architectures to fit in the accelerator's memory during training. Finally, architectural features, such as Volta's Tensor Cores, boost the raw math throughput of half precision operations by up to 8x compared to single precision. We describe two new streamlined implementations of mixed-precision training being built into TensorFlow. The first is provided through extensions to the tf.keras API and will be available in the upcoming months. The second is based on a Grappler graph optimization pass and will work with TF 1.x graph-based models as well as future TensorFlow 2.0 models that make use of tf.function decorators. Each method is enabled using a one or two line tweak to the training script. Empirical results show that result accuracy matches that of a model trained in single-precision, while training speedup is similar to what can be achieved with hand-coded mixed precision strategies.