CUDA

Feb 19, 2026

Accelerating Data Processing with NVIDIA Multi-Instance GPU and NUMA Node Localization

NVIDIA flagship data center GPUs in the NVIDIA Ampere, NVIDIA Hopper, and NVIDIA Blackwell families all feature non-uniform memory access (NUMA) behaviors, but...

12 MIN READ

Feb 18, 2026

Topping the GPU MODE Kernel Leaderboard with NVIDIA cuda.compute

Python dominates machine learning for its ergonomics, but writing truly fast GPU code has historically meant dropping into C++ to write custom kernels and to...

5 MIN READ

Feb 10, 2026

Using Accelerated Computing to Live-Steer Scientific Experiments at Massive Research Facilities

Scientists and engineers who design and build unique scientific research facilities face similar challenges. These include managing massive data rates that...

13 MIN READ

Jan 30, 2026

Advancing GPU Programming with the CUDA Tile IR Backend for OpenAI Triton

NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. One of the great things...

7 MIN READ

Jan 30, 2026

Establishing a Scalable Sparse Ecosystem with the Universal Sparse Tensor

Sparse tensors are vectors, matrices, and higher-dimensional generalizations with many zeros. They are crucial in various fields such as scientific computing,...

15 MIN READ

Jan 21, 2026

Streamlining CUB with a Single-Call API

The C++ template library CUB is a go-to for high-performance GPU primitive algorithms, but its traditional "two-phase" API, which separates memory estimation...

8 MIN READ

Jan 14, 2026

How to Write High-Performance Matrix Multiply in NVIDIA CUDA Tile

This blog post is part of a series designed to help developers learn NVIDIA CUDA Tile programming for building high-performance GPU kernels, using matrix...

13 MIN READ

Jan 05, 2026

Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

AI has entered an industrial phase. What began as systems performing discrete AI model training and human-facing inference has evolved into always-on AI...

62 MIN READ

Dec 17, 2025

Solving Large-Scale Linear Sparse Problems with NVIDIA cuDSS

Solving large-scale problems in Electronic Design Automation (EDA), Computational Fluid Dynamics (CFD), and advanced optimization workflows has become the norm...

16 MIN READ

Dec 16, 2025

Boost GPU Memory Performance with No Code Changes Using NVIDIA CUDA MPS

NVIDIA CUDA developers have access to a wide range of tools and libraries that simplify development and deployment, enabling users to focus on the “what”...

14 MIN READ

Dec 15, 2025

Reducing CUDA Binary Size to Distribute cuML on PyPI

Starting with the 25.10 release, pip-installable cuML wheels can now be downloaded directly from PyPI. No more complex installation steps or managing Conda...

8 MIN READ

Dec 10, 2025

Better Bug Detection: How Compile-Time Instrumentation for Compute Sanitizer Enhances Memory Safety

CUDA C++ is standard C++ with extensions that enable functions to run on many parallel threads on a GPU. It has facilitated widespread adoption while allowing...

11 MIN READ

Dec 04, 2025

NVIDIA CUDA 13.1 Powers Next-Gen GPU Programming with NVIDIA CUDA Tile and Performance Gains

NVIDIA CUDA 13.1 introduces the largest and most comprehensive update to the CUDA platform since it was invented two decades ago. In this release,...

11 MIN READ

Dec 04, 2025

Simplify GPU Programming with NVIDIA CUDA Tile in Python

The release of NVIDIA CUDA 13.1 introduces tile-based programming for GPUs, making it one of the most fundamental additions to GPU programming since CUDA was...

7 MIN READ

Dec 04, 2025

Focus on Your Algorithm—NVIDIA CUDA Tile Handles the Hardware

With its largest advancement since the NVIDIA CUDA platform was invented in 2006, CUDA 13.1 is launching NVIDIA CUDA Tile. This exciting innovation introduces a...

5 MIN READ

Nov 19, 2025

Building Better Qubits with GPU-Accelerated Computing

Quantum computing promises to revolutionize science and industry, from drug discovery to materials science. But building a useful, large-scale quantum computer...

5 MIN READ