Posts by Fred Oh
Networking / Communications
Jan 31, 2025
New Scaling Algorithm and Initialization with NVIDIA Collective Communications Library 2.23
The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL...
9 MIN READ
Simulation / Modeling / Design
Jan 31, 2025
Dynamic Loading in the CUDA Runtime
Historically, the GPU device code is compiled alongside the application with offline tools such as nvcc. In this case, the GPU device code is managed internally...
8 MIN READ
Simulation / Modeling / Design
Jan 31, 2025
CUDA Toolkit Now Available for NVIDIA Blackwell
The latest release of the CUDA Toolkit, version 12.8, continues to push accelerated computing performance in data sciences, AI, scientific computing, and...
9 MIN READ
Simulation / Modeling / Design
Jan 14, 2025
Upcoming Event: CUDA Developer Meet Up in Silicon Valley
Whether you're just starting your GPU programming journey or you're a CUDA ninja looking to share advanced techniques, join us in San Jose on 1/30/25.
1 MIN READ
Networking / Communications
Sep 16, 2024
Memory Efficiency, Faster Initialization, and Cost Estimation with NVIDIA Collective Communications Library 2.22
For the past few months, the NVIDIA Collective Communications Library (NCCL) developers have been working hard on a set of new library features and bug fixes....
8 MIN READ
Simulation / Modeling / Design
Sep 11, 2024
Constant Time Launch for Straight-Line CUDA Graphs and Other Performance Enhancements
CUDA Graphs are a way to define and batch GPU operations as a graph rather than a sequence of stream launches. A CUDA Graph groups a set of CUDA kernels and...
8 MIN READ