After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
Distributed Training and Fast Inter-GPU communication with NCCL
Sylvain Jeaugey, NVIDIA
NCCL, NVIDIA Collective Communication Library, is used by all Deep Learning frameworks to distribute computing on multiple GPUs, allowing users to train very large networks in minutes instead of weeks. In this session, we will present how NCCL combines hardware technologies such as NVLink, PCI, Ethernet and Infiniband to achieve maximum speed for inter-GPU communication. We will detail how those technologies compare, and how much of a difference they make for users. We will also detail how we continue to innovate to accelerate distributed GPU computing and support new models.