Posts by Fred Oh
Agentic AI / Generative AI
Dec 16, 2025
Accelerating Long-Context Inference with Skip Softmax in NVIDIA TensorRT-LLM
For machine learning engineers deploying LLMs at scale, the equation is familiar and unforgiving: as context length increases, attention computation costs...
6 MIN READ
Networking / Communications
Jan 31, 2025
New Scaling Algorithm and Initialization with NVIDIA Collective Communications Library 2.23
The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL...
9 MIN READ
Simulation / Modeling / Design
Jan 31, 2025
Dynamic Loading in the CUDA Runtime
Historically, the GPU device code is compiled alongside the application with offline tools such as nvcc. In this case, the GPU device code is managed internally...
8 MIN READ
Simulation / Modeling / Design
Jan 31, 2025
CUDA Toolkit Now Available for NVIDIA Blackwell
The latest release of the CUDA Toolkit, version 12.8, continues to push accelerated computing performance in data sciences, AI, scientific computing, and...
9 MIN READ
Simulation / Modeling / Design
Jan 14, 2025
Upcoming Event: CUDA Developer Meet Up in Silicon Valley
Whether you're just starting your GPU programming journey or you're a CUDA ninja looking to share advanced techniques, join us in San Jose on 1/30/25.
1 MIN READ
Networking / Communications
Sep 16, 2024
Memory Efficiency, Faster Initialization, and Cost Estimation with NVIDIA Collective Communications Library 2.22
For the past few months, the NVIDIA Collective Communications Library (NCCL) developers have been working hard on a set of new library features and bug fixes....
8 MIN READ