CUDA
May 26, 2026
Extract More Kernel Performance with NVIDIA CompileIQ Auto-TuningĀ
NVIDIA CompileIQ tackles one of the hardest problems in performance engineering: finding the compiler options that unlock the best performance for a specific...
12 MIN READ
May 26, 2026
Develop High-Performance GPU Kernels in C++ with NVIDIA CUDA Tile
Developers can now use NVIDIA CUDA Tile programming within large existing C++ GPU codebases to develop highly optimized GPU kernels using tile-based...
14 MIN READ
May 26, 2026
NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++, Compiler Autotuning, and Python Updates
NVIDIA CUDA 13.3 brings new capabilities and performance optimizations to developers across the CUDA ecosystem. The launch of NVIDIA CUDA Tile programming in...
13 MIN READ
May 13, 2026
Accelerated X-Ray Analysis for Nanoscale Imaging (XANI) of Novel Materials
A massive-scale X-ray free-electron laser (XFEL) enables tracking structural and electron dynamics in novel systems, including fusion materials,...
11 MIN READ
May 04, 2026
Optimize Supply Chain Decision Systems Using NVIDIA cuOpt Agent Skills
Modern supply chains operate under the constant pressures of fluctuating demand, volatile costs, constrained capacity, and interdependent decision-making....
6 MIN READ
Apr 30, 2026
Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl
NVIDIA CUDA Tile (cuTile) is a tile-based programming model that enables developers to write GPU kernels in terms of tile-level operationsāloads, stores, and...
9 MIN READ
Apr 22, 2026
Simplify Sparse Deep Learning with Universal Sparse Tensor in nvmath-python
In a previous post, we introduced the Universal Sparse Tensor (UST), enabling developers to decouple a tensorās sparsity from its memory layout for greater...
11 MIN READ
Apr 14, 2026
NVIDIA NVbandwidth: Your Essential Tool for Measuring GPU Interconnect and Memory Performance
When youāre writing CUDA applications, one of the most important things you need to focus on to write great code is data transfer performance. This applies to...
8 MIN READ
Apr 14, 2026
NVIDIA Ising Introduces AI-Powered Workflows to Build Fault-Tolerant Quantum Systems
NVIDIA Ising is the world's first family of open AI models for building quantum processors, launching with two model domains: Ising Calibration and Ising...
9 MIN READ
Apr 09, 2026
How to Accelerate Protein Structure Prediction at Proteome-Scale
Proteins rarely function in isolation as individual monomers. Most biological processes are governed by proteins interacting with other proteins, forming...
10 MIN READ
Apr 01, 2026
CUDA Tile Programming Now Available for BASIC!
Note: CUDA Tile Programming in BASIC is an April Foolsā joke, but it's also real and actually works,Ā demonstrating the flexibility of CUDA. CUDA 13.1...
7 MIN READ
Mar 25, 2026
Maximize AI Infrastructure Throughput by Consolidating Underutilized GPU Workloads
In production Kubernetes environments, the difference between model requirements and GPU size creates inefficiencies. Lightweight automatic speech recognition...
9 MIN READ
Mar 16, 2026
How NVIDIA Dynamo 1.0 Powers Multi-Node Inference at Production Scale
Reasoning models are growing rapidly in size and are increasingly being integrated into agentic AI workflows that interact with other models and external...
14 MIN READ
Mar 09, 2026
CUDA 13.2 Introduces Enhanced CUDA Tile Support and New Python Features
CUDA 13.2 arrives with a major update: NVIDIA CUDA Tile is now supported on devices of compute capability 8.X architectures (NVIDIA Ampere and NVIDIA Ada), as...
15 MIN READ
Mar 05, 2026
Tuning Flash Attention for Peak Performance in NVIDIA CUDA Tile
In this post, we dive into one of the most critical workloads in modern AI: Flash Attention, where youāll learn: How to implement Flash Attention using NVIDIA...
20 MIN READ
Mar 05, 2026
Controlling Floating-Point Determinism in NVIDIA CCCL
A computation is considered deterministic if multiple runs with the same input data produce the same bitwise result. While this may seem like a simple property...
7 MIN READ