Posts by Mark Harris
Technical Walkthrough
Feb 10, 2022
Implementing High-Precision Decimal Arithmetic with CUDA int128
“Truth is much too complicated to allow anything but approximations.” -- John von Neumann The history of computing has demonstrated that there is no limit...
19 MIN READ
Technical Walkthrough
Dec 08, 2020
Fast, Flexible Allocation for NVIDIA CUDA with RAPIDS Memory Manager
When I joined the RAPIDS team in 2018, NVIDIA CUDA device memory allocation was a performance problem. RAPIDS cuDF allocates and deallocates memory at high...
24 MIN READ
Technical Walkthrough
Aug 20, 2019
CUDA Pro Tip: The Fast Way to Query Device Properties
CUDA applications often need to know the maximum available shared memory per block or to query the number of multiprocessors in the active GPU. One way to do...
3 MIN READ
Technical Walkthrough
Oct 15, 2018
RAPIDS Accelerates Data Science End-to-End
Today's data science problems demand a dramatic increase in the scale of data as well as the computational power required to process it. Unfortunately, the...
10 MIN READ
Technical Walkthrough
Oct 04, 2017
Cooperative Groups: Flexible CUDA Thread Programming
In efficient parallel algorithms, threads cooperate and share data to perform collective computations. To share data, the threads must synchronize. The...
16 MIN READ
Technical Walkthrough
Jun 19, 2017
Unified Memory for CUDA Beginners
My previous introductory post, "An Even Easier Introduction to CUDA C++", introduced the basics of CUDA programming by showing how to write a simple program...
16 MIN READ