GTC 2020: DeepSpeed: System optimizations enable training deep learning models with over 100 billion parameters
After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
DeepSpeed: System optimizations enable training deep learning models with over 100 billion parameters
Yuxiong He, Microsoft | Samyam Rajbhandari, Microsoft
Explore new techniques in Microsoft's open-source library called DeepSpeed, which vastly advances large model training by improving scale, speed, cost, and usability, unlocking the ability to train 100-billion-parameter models. DeepSpeed is compatible with PyTorch. One piece of library, called ZeRO, is a new parallelized optimizer that greatly reduces the resources needed for model and data parallelism while massively increasing the number of parameters that can be trained. Researchers have used these breakthroughs to create Turing Natural Language Generation (Turing-NLG), the largest publicly known language model at 17 billion parameters.