InfiniBand
Sep 06, 2024
Enhancing Application Portability and Compatibility across New Platforms Using NVIDIA Magnum IO NVSHMEM 3.0
NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on...
7 MIN READ
Jan 23, 2024
Simplifying Network Operations for AI with NVIDIA Quantum InfiniBand
A common technological misconception is that performance and complexity are directly linked. That is, the highest-performance implementation is also the most...
4 MIN READ
Nov 14, 2023
Energy Efficiency in High-Performance Computing: Balancing Speed and Sustainability
The world of computing is on the precipice of a seismic shift. The demand for computing power, particularly in high-performance computing (HPC), is...
17 MIN READ
Nov 08, 2023
Setting New Records at Data Center Scale Using NVIDIA H100 GPUs and NVIDIA Quantum-2 InfiniBand
Generative AI is rapidly transforming computing, unlocking new use cases and turbocharging existing ones. Large language models (LLMs), such as OpenAI’s GPT...
19 MIN READ
Oct 12, 2023
Networking for Data Centers and the Era of AI
Traditional cloud data centers have served as the bedrock of computing infrastructure for over a decade, catering to a diverse range of users and applications....
6 MIN READ
May 15, 2023
Efficiently Scale LLM Training Across a Large GPU Cluster with Alpa and Ray
Recent years have seen a proliferation of large language models (LLMs) that extend beyond traditional language tasks to generative AI. This includes models like...
16 MIN READ
Apr 05, 2023
Setting New Records in MLPerf Inference v3.0 with Full-Stack Optimizations for AI
The most exciting computing applications currently rely on training and running inference on complex AI models, often in demanding, real-time deployment...
15 MIN READ
May 24, 2022
Optimizing Your Data Center Network
Data centers can be optimized by updating key network architectures in two ways: through networking technologies or operational efficiency in NetDevOps. In this...
5 MIN READ
Nov 10, 2021
Announcing NVIDIA Nsight Systems 2021.5
The latest update to NVIDIA Nsight Systems—a performance analysis tool—is now available for download. Designed to help you tune and scale software across...
3 MIN READ
Nov 09, 2021
Accelerating Cloud-Native Supercomputing with Magnum IO
Supercomputers are significant investments. However they are extremely valuable tools for researchers and scientists. To effectively and securely share the...
4 MIN READ
Jun 28, 2021
Managing Data Centers Securely and Intelligently with NVIDIA UFM Cyber-AI
Today’s data centers host many users and a wide variety of applications. They have even become the key element of competitive advantage for research,...
6 MIN READ
Feb 05, 2021
Accelerating IO in the Modern Data Center: Computing and IO Management
This is the third post in the Accelerating IO series, which has the goal of describing the architecture, components, and benefits of Magnum IO, the IO subsystem...
14 MIN READ
Sep 26, 2018
Scaling Deep Learning Training with NCCL
NVIDIA Collective Communications Library (NCCL) provides optimized implementation of inter-GPU communication operations, such as allreduce and variants....
6 MIN READ