NCCL: Getting Started

Developers of deep learning frameworks can rely on NCCL’s highly optimized, MPI compatible and topology aware routines, to take full advantage of all available GPUs within and across multiple nodes. Leading deep learning frameworks such as Caffe, Caffe2, Chainer, MxNet, TensorFlow, and PyTorch have integrated NCCL to accelerate deep learning training on multi-GPU systems.

We strive to bring the best experiences to the developer community, as a result we have made NCCL 2.3 and later open source. This enables us to have open discussions with the developer community as we continue to build a great product. The source code for NCCL is available on GitHub and NCCL binaries can be downloaded from NVIDIA Developer Zone.