NCCL

Jul 22, 2025
Understanding NCCL Tuning to Accelerate GPU-to-GPU Communication
The NVIDIA Collective Communications Library (NCCL) is essential for fast GPU-to-GPU communication in AI workloads, using various optimizations and tuning to...
14 MIN READ

Jul 18, 2025
Optimizing for Low-Latency Communication in Inference Workloads with JAX and XLA
Running inference with large language models (LLMs) in production requires meeting stringent latency constraints. A critical stage in the process is LLM decode,...
6 MIN READ

Jul 14, 2025
Enabling Fast Inference and Resilient Training with NCCL 2.27
As AI workloads scale, fast and reliable GPU communication becomes vital, not just for training, but increasingly for inference at scale. The NVIDIA Collective...
9 MIN READ

Jul 14, 2025
NCCL Deep Dive: Cross Data Center Communication and Network Topology Awareness
As the scale of AI training increases, a single data center (DC) is not sufficient to deliver the required computational power. Most recent approaches to...
9 MIN READ

Jun 18, 2025
Improved Performance and Monitoring Capabilities with NVIDIA Collective Communications Library 2.26
The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL...
11 MIN READ

May 14, 2025
AI Fabric Resiliency and Why Network Convergence Matters
High-performance computing and deep learning workloads are extremely sensitive to latency. Packet loss forces retransmission or stalls in the communication...
7 MIN READ

Mar 13, 2025
Networking Reliability and Observability at Scale with NCCL 2.24
The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode (MGMN) communication primitives optimized for NVIDIA GPUs and networking....
14 MIN READ

Jan 31, 2025
New Scaling Algorithm and Initialization with NVIDIA Collective Communications Library 2.23
The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL...
9 MIN READ

Oct 25, 2024
Advancing Performance with NVIDIA SHARP In-Network Computing
AI and scientific computing applications are great examples of distributed computing problems. The problems are too large and the computations too intensive to...
7 MIN READ

Sep 16, 2024
Memory Efficiency, Faster Initialization, and Cost Estimation with NVIDIA Collective Communications Library 2.22
For the past few months, the NVIDIA Collective Communications Library (NCCL) developers have been working hard on a set of new library features and bug fixes....
8 MIN READ

Sep 06, 2024
Enhancing Application Portability and Compatibility across New Platforms Using NVIDIA Magnum IO NVSHMEM 3.0
NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on...
7 MIN READ

Apr 26, 2024
Perception Model Training for Autonomous Vehicles with Tensor Parallelism
Due to the adoption of multicamera inputs and deep convolutional backbone networks, the GPU memory footprint for training autonomous driving perception models...
10 MIN READ

Mar 06, 2024
CUDA Toolkit 12.4 Enhances Support for NVIDIA Grace Hopper and Confidential Computing
The latest release of CUDA Toolkit, version 12.4, continues to push accelerated computing performance using the latest NVIDIA GPUs. This post explains the new...
9 MIN READ

Oct 12, 2023
Networking for Data Centers and the Era of AI
Traditional cloud data centers have served as the bedrock of computing infrastructure for over a decade, catering to a diverse range of users and applications....
6 MIN READ

Jul 19, 2023
OCI Accelerates HPC, AI, and Database Using RoCE and NVIDIA ConnectX
Oracle is one of the top cloud service providers in the world, supporting over 22,000 customers and reporting revenue of nearly $4 billion per quarter and...
18 MIN READ

May 29, 2023
Turbocharging Generative AI Workloads with NVIDIA Spectrum-X Networking Platform
Large language models (LLMs) and AI applications such as ChatGPT and DALL-E have recently seen rapid growth. Thanks to GPUs, CPUs, DPUs, high-speed storage, and...
8 MIN READ