Technical Walkthrough 0

Accelerating IO in the Modern Data Center: Network IO

This is the second post in the Accelerating IO series, which describes the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern… 19 MIN READ
Technical Walkthrough 0

Massively Scale Your Deep Learning Training with NCCL 2.4

Imagine using tens of thousands of GPUs to train your neural network. Using multiple GPUs to train neural networks has become quite common with all deep… 8 MIN READ
Technical Walkthrough 0

Scaling Deep Learning Training with NCCL

NVIDIA Collective Communications Library (NCCL) provides optimized implementation of inter-GPU communication operations, such as allreduce and variants. 6 MIN READ
News 0

NVIDIA Deep Learning SDK Update for Volta Now Available

At GTC 2017, NVIDIA announced Volta optimized updates to the NVIDIA Deep Learning SDK. Today, we’re making these updates available as free downloads to members… 2 MIN READ
Figure 5: Ring order of GPUs in PCIe tree.
Technical Walkthrough 0

Fast Multi-GPU collectives with NCCL

Today many servers contain 8 or more GPUs. In principle then, scaling an application from one to many GPUs should provide a tremendous performance boost. 10 MIN READ