NVSHMEM

Jul 22, 2025

Understanding NCCL Tuning to Accelerate GPU-to-GPU Communication

The NVIDIA Collective Communications Library (NCCL) is essential for fast GPU-to-GPU communication in AI workloads, using various optimizations and tuning to...

14 MIN READ

Jul 18, 2025

Optimizing for Low-Latency Communication in Inference Workloads with JAX and XLA

Running inference with large language models (LLMs) in production requires meeting stringent latency constraints. A critical stage in the process is LLM decode,...

6 MIN READ

Mar 13, 2025

Networking Reliability and Observability at Scale with NCCL 2.24

The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode (MGMN) communication primitives optimized for NVIDIA GPUs and networking....

14 MIN READ

Jan 31, 2025

New Scaling Algorithm and Initialization with NVIDIA Collective Communications Library 2.23

The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL...

9 MIN READ

Sep 06, 2024

Enhancing Application Portability and Compatibility across New Platforms Using NVIDIA Magnum IO NVSHMEM 3.0

NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on...

7 MIN READ

Nov 22, 2022

Improving Network Performance of HPC Systems Using NVIDIA Magnum IO NVSHMEM and GPUDirect Async

Today’s leading-edge high performance computing (HPC) systems contain tens of thousands of GPUs. In NVIDIA systems, GPUs are connected on nodes through the...

14 MIN READ

Jan 22, 2021

Accelerating NVSHMEM 2.0 Team-Based Collectives Using NCCL

NVSHMEM 2.0 is introducing a new API for performing collective operations based on the Team Management feature of the OpenSHMEM 1.5 specification. A team is a...

9 MIN READ

Oct 20, 2020

Accelerating IO in the Modern Data Center: Network IO

This is the second post in the Accelerating IO series, which describes the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern...

19 MIN READ

Aug 25, 2020

Scaling Scientific Computing with NVSHMEM

Figure 1. In the NVSHMEM memory model, each process (PE) has private memory, as well as symmetric memory that forms a partition of the partitioned global...

10 MIN READ