GTC 2020: Performance and Model Fidelity of BERT Training from a Single DGX Through DGX SuperPod
After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
Performance and Model Fidelity of BERT Training from a Single DGX Through DGX SuperPod
Chris Forster, NVIDIA | Thor Johnsen, NVIDIA
BERT (Bidirectional Encoder Representations from Transformers) is a natural language processing model that performs well on a wide variety of tasks, including (but not limited to) question answering, natural language inference, and classification. We'll cover how you can use our open-source code to train BERT models themselves, right from dataset creation to fine-tuning for specific NLP tasks, such as question answering with the SQuAD dataset. We'll also discuss some of the challenges and solutions to delivering both computational performance and model fidelity on large distributed machines, such as the DGX SuperPod. We'll offer a brief overview of the model itself, choice of optimizers, performance optimizations, testing methodology, running BERT at scales up to 1,472 GPUs, and we'll summarize the results that our open-source multi-node BERT examples in Tensorflow and PyTorch can achieve.