NVIDIA Collective Communications Library (NCCL)
NVIDIA NCCL
The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs. NCCL provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter, that are optimized to achieve high bandwidth and low latency over PCIe and NVLink high-speed interconnect.
Get Started
Performance
NCCL conveniently removes the need for developers to optimize their applications for specific machines. NCCL provides fast collectives over multiple GPUs both within and across nodes.
Ease of Programming
NCCL uses a simple C API, which can be easily accessed from a variety of programming languages.NCCL closely follows the popular collectives API defined by MPI (Message Passing Interface).
Compatibility
NCCL is compatible with virtually any multi-GPU parallelization model, such as: single-threaded, multi-threaded (using one thread per GPU) and multi-process (MPI combined with multi-threaded operation on GPUs).
Key Features
- Automatic topology detection for high bandwidth paths on AMD, ARM, PCI Gen4 and IB HDR
- Up to 2x peak bandwidth with in-network all reduce operations utilizing SHARPV2
- Graph search for the optimal set of rings and trees with the highest bandwidth and lowest latency
- Support multi-threaded and multi-process applications
- InfiniBand verbs, libfabric, RoCE and IP Socket internode communication
- Reroute traffic and alleviate congested ports with InfiniBand Adaptive routing
What's New in NCCL 2.7
NCCL 2.7 highlights include:
- Up to 2x bandwidth on NVIDIA A100 GPUs compared to V100 GPUs
- Preview of point-to-point communication capability to enable model parallel training for models like Wide & Deep and DLRM
- Compatible with CUDA 11
Read the latest NCCL release notes for a detailed list of new features and enhancements.
RESOURCES
- NVIDIA Deep Learning SDK documentation
- Developer Blog: Massively Scale Your Deep Learning Training with NCCL 2.4
- Developer Blog: Scaling Deep Learning Training with NCCL 2.3
- Related libraries and software:
- Report NCCL issues via GitHub
- For questions or to provide feedback, please contact nccl@nvidia.com
- To file bugs or report an issue, register on NVIDIA Developer Zone
Ready to get started developing with NCCL?
Get Started






Click to enlarge