CUDA C++

Dec 15, 2025

Reducing CUDA Binary Size to Distribute cuML on PyPI

Starting with the 25.10 release, pip-installable cuML wheels can now be downloaded directly from PyPI. No more complex installation steps or managing Conda...

8 MIN READ

Aug 27, 2025

How to Improve CUDA Kernel Performance with Shared Memory Register Spilling

When a CUDA kernel requires more hardware registers than are available, the compiler is forced to move the excess variables into local memory, a process known...

9 MIN READ

Aug 07, 2025

Efficient Transforms in cuDF Using JIT Compilation

RAPIDS cuDF offers a broad set of ETL algorithms for processing data with GPUs. For pandas users, cuDF accelerated algorithms are available with the zero code...

9 MIN READ

May 09, 2025

CUDA C++ Compiler Updates Impacting ELF Visibility and Linkage

In the next CUDA major release, CUDA 13.0, NVIDIA is introducing two significant changes to the NVIDIA CUDA Compiler Driver (NVCC) that will impact ELF...

11 MIN READ

Nov 28, 2024

Supercharging Deduplication in pandas Using RAPIDS cuDF

A common operation in data analytics is to drop duplicate rows. Deduplication is critical in Extract, Transform, Load (ETL) workflows, where you might want to...

12 MIN READ

Decorative image of light fields in green, purple, and blue.

Aug 08, 2024

Improving GPU Performance by Reducing Instruction Cache Misses

GPUs are specially designed to crunch through massive amounts of data at high speed. They have a large amount of compute resources, called streaming...

12 MIN READ

Nov 13, 2023

Simplifying GPU Programming for HPC with NVIDIA Grace Hopper Superchip

The new hardware developments in NVIDIA Grace Hopper Superchip systems enable some dramatic changes to the way developers approach GPU programming. Most...

17 MIN READ

Aug 22, 2023

Simplifying GPU Application Development with Heterogeneous Memory Management

Heterogeneous Memory Management (HMM) is a CUDA memory management feature that extends the simplicity and productivity of the CUDA Unified Memory programming...

16 MIN READ

Apr 20, 2023

Debugging a Mixed Python and C Language Stack

Debugging is difficult. Debugging across multiple languages is especially challenging, and debugging across devices often requires a team with varying skill...

18 MIN READ

Jun 23, 2022

Just Released: CUTLASS v2.9

The latest version of CUTLASS offers users BLAS3 operators accelerated by tensor cores, Python integrations, GEMM compatibility extensions, and more.

1 MIN READ

Mar 23, 2022

Boosting Application Performance with GPU Memory Prefetching

NVIDIA GPUs have enormous compute power and typically must be fed data at high speed to deploy that power. That is possible, in principle, because GPUs also...

10 MIN READ

Feb 10, 2022

Implementing High-Precision Decimal Arithmetic with CUDA int128

“Truth is much too complicated to allow anything but approximations.” -- John von Neumann The history of computing has demonstrated that there is no limit...

19 MIN READ

Image depicting NVIDIA CEO Jen-Hsun Huang explaining the importance of the RAPIDS launch demo at GTC Europe 2018.

Dec 08, 2020

Fast, Flexible Allocation for NVIDIA CUDA with RAPIDS Memory Manager

When I joined the RAPIDS team in 2018, NVIDIA CUDA device memory allocation was a performance problem. RAPIDS cuDF allocates and deallocates memory at high...

24 MIN READ

Jun 19, 2017

Unified Memory for CUDA Beginners

My previous introductory post, "An Even Easier Introduction to CUDA C++", introduced the basics of CUDA programming by showing how to write a simple program...

16 MIN READ

Feb 23, 2016

High-Performance Geometric Multi-Grid with GPU Acceleration

Linear solvers are probably the most common tool in scientific computing applications. There are two basic classes of methods that can be used to solve an...

16 MIN READ

Oct 19, 2015

Cutting Edge Parallel Algorithms Research with CUDA

Leyuan Wang, a Ph.D. student in the UC Davis Department of Computer Science, presented one of only two “Distinguished Papers” of the 51 accepted at Euro-Par...

14 MIN READ