Nathan Luehr, NVIDIA
We’ll describe a new feature in TensorFlow that enables mixed precision through the addition of a single line of Python code to training scripts. It is critical to use a mix of single and half precision to obtain peak performance and energy efficiency while maintaining full accuracy. Compared to single precision, half precision makes 2x better use of the available DRAM bandwidth, allows training of deeper network architectures, and boosts raw math throughput up to 8x using Volta Tensor Cores. We’ll describe our approach, which is based on a Grappler graph optimization pass and works with TF 1.x graph-based models as well as with future TensorFlow 2.0 models that make use of tf.function decorators. Empirical results show that result accuracy matches that of a model trained in single-precision, while training speedups are similar to what can be achieved with hand-coded mixed precision strategies.