NCCL
Sep 16, 2024
Memory Efficiency, Faster Initialization, and Cost Estimation with NVIDIA Collective Communications Library 2.22
For the past few months, the NVIDIA Collective Communications Library (NCCL) developers have been working hard on a set of new library features and bug fixes....
8 MIN READ
Sep 06, 2024
Enhancing Application Portability and Compatibility across New Platforms Using NVIDIA Magnum IO NVSHMEM 3.0
NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on...
7 MIN READ
Apr 26, 2024
Perception Model Training for Autonomous Vehicles with Tensor Parallelism
Due to the adoption of multicamera inputs and deep convolutional backbone networks, the GPU memory footprint for training autonomous driving perception models...
10 MIN READ
Mar 06, 2024
CUDA Toolkit 12.4 Enhances Support for NVIDIA Grace Hopper and Confidential Computing
The latest release of CUDA Toolkit, version 12.4, continues to push accelerated computing performance using the latest NVIDIA GPUs. This post explains the new...
9 MIN READ
Oct 12, 2023
Networking for Data Centers and the Era of AI
Traditional cloud data centers have served as the bedrock of computing infrastructure for over a decade, catering to a diverse range of users and applications....
6 MIN READ
Jul 19, 2023
OCI Accelerates HPC, AI, and Database Using RoCE and NVIDIA ConnectX
Oracle is one of the top cloud service providers in the world, supporting over 22,000 customers and reporting revenue of nearly $4 billion per quarter and...
18 MIN READ
May 29, 2023
Turbocharging Generative AI Workloads with NVIDIA Spectrum-X Networking Platform
Large language models (LLMs) and AI applications such as ChatGPT and DALL-E have recently seen rapid growth. Thanks to GPUs, CPUs, DPUs, high-speed storage, and...
8 MIN READ
May 25, 2023
Navigating Generative AI for Network Admins
We all know that AI is changing the world. For network admins, AI can improve day-to-day operations in some amazing ways: Automation of repetitive tasks: This...
6 MIN READ
Oct 20, 2020
Accelerating IO in the Modern Data Center: Network IO
This is the second post in the Accelerating IO series, which describes the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern...
19 MIN READ
Feb 04, 2019
Massively Scale Your Deep Learning Training with NCCL 2.4
Imagine using tens of thousands of GPUs to train your neural network. Using multiple GPUs to train neural networks has become quite common with all deep...
8 MIN READ
Sep 26, 2018
Scaling Deep Learning Training with NCCL
NVIDIA Collective Communications Library (NCCL)Â provides optimized implementation of inter-GPU communication operations, such as allreduce and variants....
6 MIN READ
Aug 08, 2017
NVIDIA Deep Learning SDK Update for Volta Now Available
At GTC 2017, NVIDIA announced Volta optimized updates to the NVIDIA Deep Learning SDK. Today, we’re making these updates available as free downloads to...
2 MIN READ
Apr 07, 2016
Fast Multi-GPU collectives with NCCL
Today many servers contain 8 or more GPUs. In principle then, scaling an application from one to many GPUs should provide a tremendous performance boost. But in...
10 MIN READ