After clicking “Watch Now” you will be prompted to login or join.


Click “Watch Now” to login or join the NVIDIA Developer Program.


Scaling the Transformer Model Implementation in PyTorch Across Multiple Nodes

Mohammad Zulfiqar, NVIDIA | Robert Knight, NVIDIA

GTC 2020

We'll dive deep behind the scenes into the Transformer model implementation in PyTorch to understand its performance weaknesses and work to make it scale across multiple nodes. We'll describe an analysis of system-level profiling data of an example Transformer workload, spanning multiple DGX-2 systems. We'll present the tools, collection methods, and data-analytics recipes, used to evaluate massive amounts of data and pinpoint the GPU/step of the algorithm causing issues. The described methodology can, in general, be applied to iterative DL and HPC workloads to achieve significant scaling gains.

View More GTC 2020 Content