NVLink

Nov 28, 2023
One Giant Superchip for LLMs, Recommenders, and GNNs: Introducing NVIDIA GH200 NVL32
At AWS re:Invent 2023, AWS and NVIDIA announced that AWS will be the first cloud provider to offer NVIDIA GH200 Grace Hopper Superchips interconnected with...
9 MIN READ

May 28, 2023
Announcing NVIDIA DGX GH200: The First 100 Terabyte GPU Memory System
At COMPUTEX 2023, NVIDIA announced the NVIDIA DGX GH200, which marks another breakthrough in GPU-accelerated computing to power the most demanding giant AI...
6 MIN READ

Aug 23, 2022
Upgrading Multi-GPU Interconnectivity with the Third-Generation NVIDIA NVSwitch
Increasing demands in AI and high-performance computing (HPC) are driving a need for faster, more scalable interconnects with high-speed communication between...
13 MIN READ

Jun 02, 2022
Fueling High-Performance Computing with Full-Stack Innovation
High-performance computing (HPC) has become the essential instrument of scientific discovery. Whether it is discovering new, life-saving drugs, battling...
8 MIN READ

Apr 12, 2021
Optimizing Data Movement in GPU Applications with the NVIDIA Magnum IO Developer Environment
Magnum IO is the collection of IO technologies from NVIDIA and Mellanox that make up the IO subsystem of the modern data center and enable applications at...
8 MIN READ

Feb 12, 2021
Improving GPU Application Performance with NVIDIA CUDA 11.2 Device Link Time Optimization
CUDA 11.2 features the powerful link time optimization (LTO) feature for device code in GPU-accelerated applications. Device LTO brings the performance...
14 MIN READ

Jan 22, 2021
Accelerating NVSHMEM 2.0 Team-Based Collectives Using NCCL
NVSHMEM 2.0 is introducing a new API for performing collective operations based on the Team Management feature of the OpenSHMEM 1.5 specification. A team is a...
9 MIN READ

Dec 18, 2020
Optimizing Data Transfer Using Lossless Compression with NVIDIA nvcomp
One of the most interesting applications of compression is optimizing communications in GPU applications. GPUs are getting faster every year. For some apps,...
17 MIN READ

Dec 08, 2020
Fast, Flexible Allocation for NVIDIA CUDA with RAPIDS Memory Manager
When I joined the RAPIDS team in 2018, NVIDIA CUDA device memory allocation was a performance problem. RAPIDS cuDF allocates and deallocates memory at high...
24 MIN READ

Oct 20, 2020
Accelerating IO in the Modern Data Center: Network IO
This is the second post in the Accelerating IO series, which describes the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern...
19 MIN READ

Aug 25, 2020
Scaling Scientific Computing with NVSHMEM
Figure 1. In the NVSHMEM memory model, each process (PE) has private memory, as well as symmetric memory that forms a partition of the partitioned global...
10 MIN READ

Oct 31, 2018
Lawrence Livermore Unveils Sierra, World's Third Fastest Supercomputer
The U.S. Department of Energy and the Lawrence Livermore National Laboratory (LLNL) last week announced the unveiling of Sierra, one of the world’s fastest...
2 MIN READ

Sep 26, 2018
Scaling Deep Learning Training with NCCL
NVIDIA Collective Communications Library (NCCL)Â provides optimized implementation of inter-GPU communication operations, such as allreduce and variants....
6 MIN READ

Aug 21, 2018
NVSwitch Accelerates NVIDIA DGX-2
NVIDIA CEO Jensen Huang described the NVIDIA® DGX-2™ server as "the world's largest GPU" at its launch during GPU Technology Conference earlier this...
8 MIN READ

Mar 27, 2018
NVSwitch: Leveraging NVLink to Maximum Effect
GPUs have been PCIe devices for many generations in client systems, and more recently in servers. The rapid growth in deep learning workloads has driven the...
5 MIN READ

Apr 05, 2017
NVIDIA DGX-1: The Fastest Deep Learning System
One year ago today, NVIDIA announced the NVIDIA® DGX-1™,...
12 MIN READ