Technical Walkthrough 0

Boosting Application Performance with GPU Memory Prefetching

This CUDA post examines the effectiveness of methods to hide memory latency using explicit prefetching. 10 MIN READ
Technical Walkthrough 1

Implementing High-Precision Decimal Arithmetic with CUDA int128

This post details CUDA's new int128 support and how to implement decimal fixed-point arithmetic on top of it. 19 MIN READ
Image depicting NVIDIA CEO Jen-Hsun Huang explaining the importance of the RAPIDS launch demo at GTC Europe 2018.
Technical Walkthrough 0

Fast, Flexible Allocation for NVIDIA CUDA with RAPIDS Memory Manager

When I joined the RAPIDS team in 2018, NVIDIA CUDA device memory allocation was a performance problem. RAPIDS cuDF allocates and deallocates memory at high… 24 MIN READ
Technical Walkthrough 0

Unified Memory for CUDA Beginners

This post introduces CUDA programming with Unified Memory, a single memory address space that is accessible from any GPU or CPU in a system. 16 MIN READ
Technical Walkthrough 0

High-Performance Geometric Multi-Grid with GPU Acceleration

Algorithms and optimizations for accelerating geometric multi-grid in the HPGMG benchmark with GPUs, including scalability on supercomputers. 16 MIN READ
Technical Walkthrough 0

Cutting Edge Parallel Algorithms Research with CUDA

Leyuan Wang, a Ph.D. student in the UC Davis Department of Computer Science, presented one of only two “Distinguished Papers” of the 51 accepted at Euro-Par… 14 MIN READ