CUDA
Jun 16, 2026
Building AI Agents for AR Glasses and XR Devices with NVIDIA XR AI
Developers building for AR glasses and wearable devices face an infrastructure gap. The hardware is ready, but creating AI experiences requires integrating...
8 MIN READ
Jun 16, 2026
Build Your Own Transaction Foundation Model for Financial Intelligence
Every swipe, transfer, and payment on a modern financial network encodes a pattern of human behavior. Transaction data is one of the richest signals an...
11 MIN READ
Jun 16, 2026
How to Optimize Transformer-Based Models for Low-Precision Training
Transformer architectures are the backbone of many modern large language and generative AI models. As these models grow in size, training runs consume more GPU...
9 MIN READ
Jun 16, 2026
NVIDIA Blackwell Tops MLPerf Training 6.0 with Industry-Leading Scale and Performance
NVIDIA delivered a clean sweep in MLPerf Training v6.0, the latest edition of industry-standard AI training benchmarks developed by the MLCommons consortium....
11 MIN READ
Jun 15, 2026
Boosting MoE Training Throughput with Advanced Fusion Kernels
Mixture-of-experts (MoE) models have quickly become a foundational component of modern, large-scale AI systems. They are widely adopted because they enable...
9 MIN READ
Jun 01, 2026
Deploy Agentic-Ready AI at the Edge with Memory Efficiency in NVIDIA JetPack 7.2
As AI agents move from the digital world to the physical environment, they can readily use NVIDIA Jetson to accelerate real-world deployment with optimized...
10 MIN READ
May 26, 2026
Extract More Kernel Performance with NVIDIA CompileIQ Auto-TuningÂ
NVIDIA CompileIQ tackles one of the hardest problems in performance engineering: finding the compiler options that unlock the best performance for a specific...
12 MIN READ
May 26, 2026
Develop High-Performance GPU Kernels in C++ with NVIDIA CUDA Tile
Developers can now use NVIDIA CUDA Tile programming within large existing C++ GPU codebases to develop highly optimized GPU kernels using tile-based...
14 MIN READ
May 26, 2026
NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++, Compiler Autotuning, and Python Updates
NVIDIA CUDA 13.3 brings new capabilities and performance optimizations to developers across the CUDA ecosystem. The launch of NVIDIA CUDA Tile programming in...
13 MIN READ
May 13, 2026
Accelerated X-Ray Analysis for Nanoscale Imaging (XANI) of Novel Materials
A massive-scale X-ray free-electron laser (XFEL) enables tracking structural and electron dynamics in novel systems, including fusion materials,...
11 MIN READ
May 04, 2026
Optimize Supply Chain Decision Systems Using NVIDIA cuOpt Agent Skills
Modern supply chains operate under the constant pressures of fluctuating demand, volatile costs, constrained capacity, and interdependent decision-making....
6 MIN READ
Apr 30, 2026
Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl
NVIDIA CUDA Tile (cuTile) is a tile-based programming model that enables developers to write GPU kernels in terms of tile-level operations—loads, stores, and...
9 MIN READ
Apr 22, 2026
Simplify Sparse Deep Learning with Universal Sparse Tensor in nvmath-python
In a previous post, we introduced the Universal Sparse Tensor (UST), enabling developers to decouple a tensor’s sparsity from its memory layout for greater...
11 MIN READ
Apr 14, 2026
NVIDIA NVbandwidth: Your Essential Tool for Measuring GPU Interconnect and Memory Performance
When you’re writing CUDA applications, one of the most important things you need to focus on to write great code is data transfer performance. This applies to...
8 MIN READ
Apr 09, 2026
How to Accelerate Protein Structure Prediction at Proteome-Scale
Proteins rarely function in isolation as individual monomers. Most biological processes are governed by proteins interacting with other proteins, forming...
10 MIN READ
Apr 01, 2026
CUDA Tile Programming Now Available for BASIC!
Note: CUDA Tile Programming in BASIC is an April Fools’ joke, but it's also real and actually works, demonstrating the flexibility of CUDA. CUDA 13.1...
7 MIN READ