Recent posts
Feb 03, 2026
Accelerating Long-Context Model Training in JAX and XLA
Large language models (LLMs) are rapidly expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond....
9 MIN READ
Feb 02, 2026
Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallel
In LLM training, Expert Parallel (EP) communication for hyperscale mixture-of-experts (MoE) models is challenging. EP communication is essentially all-to-all,...
11 MIN READ
Jan 30, 2026
Advancing GPU Programming with the CUDA Tile IR Backend for OpenAI Triton
NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. One of the great things...
7 MIN READ
Jan 30, 2026
Establishing a Scalable Sparse Ecosystem with the Universal Sparse Tensor
Sparse tensors are vectors, matrices, and higher-dimensional generalizations with many zeros. They are crucial in various fields such as scientific computing,...
15 MIN READ
Jan 30, 2026
Practical Security Guidance for Sandboxing Agentic Workflows and Managing Execution Risk
AI coding agents enable developers to work faster by streamlining tasks and driving automated, test-driven development. However, they also introduce a...
13 MIN READ
Jan 28, 2026
Ensuring Balanced GPU Allocation in Kubernetes Clusters with Time-Based Fairshare
NVIDIA Run:ai v2.24 introduces time-based fairshare, a new scheduling mode that brings fair-share scheduling with time awareness for over-quota resources to...
11 MIN READ
Jan 28, 2026
Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core
This post introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core used for LLM post-training or DiT pre-training. It...
12 MIN READ
Jan 28, 2026
Updating Classifier Evasion for Vision Language Models
Advances in AI architectures have unlocked multimodal functionality, enabling transformer models to process multiple forms of data in the same context. For...
10 MIN READ
Jan 27, 2026
Accelerating Diffusion Models with an Open, Plug-and-Play Offering
Recent advances in large-scale diffusion models have revolutionized generative AI across multiple domains, from image synthesis to audio generation, 3D asset...
8 MIN READ
Jan 26, 2026
Adaptive Inference in NVIDIA TensorRT for RTX Enables Automatic Optimization
Deploying AI applications across diverse consumer hardware has traditionally forced a trade-off. You can optimize for specific GPU configurations and achieve...
9 MIN READ
Jan 26, 2026
How to Unlock Local Detail in Coarse Climate Projections with NVIDIA Earth-2
Global climate models are good at the big picture—but local climate extremes, like hurricanes and typhoons, often disappear in the details. Those patterns are...
12 MIN READ
Jan 22, 2026
Scaling NVFP4 Inference for FLUX.2 on NVIDIA Blackwell Data Center GPUs
In 2025, NVIDIA partnered with Black Forest Labs (BFL) to optimize the FLUX.1 text-to-image model series, unlocking FP4 image generation performance on NVIDIA...
9 MIN READ
Jan 21, 2026
Streamlining CUB with a Single-Call API
The C++ template library CUB is a go-to for high-performance GPU primitive algorithms, but its traditional "two-phase" API, which separates memory estimation...
8 MIN READ
Jan 15, 2026
How to Train an AI Agent for Command-Line Tasks with Synthetic Data and Reinforcement Learning
What if your computer-use agent could learn a new Command Line Interface (CLI)—and operate it safely without ever writing files or free-typing shell commands?...
11 MIN READ
Jan 14, 2026
How to Write High-Performance Matrix Multiply in NVIDIA CUDA Tile
This blog post is part of a series designed to help developers learn NVIDIA CUDA Tile programming for building high-performance GPU kernels, using matrix...
13 MIN READ
Jan 14, 2026
NVIDIA DLSS 4.5 Delivers Super Resolution Upgrades and New Dynamic Multi Frame Generation
NVIDIA DLSS 4 with Multi Frame Generation has become the fastest-adopted NVIDIA gaming technology ever. Over 250 games and apps use it to make real-time path...
6 MIN READ