Mark Harris

Mark is an NVIDIA Distinguished Engineer working on RAPIDS. Mark has over twenty years of experience developing software for GPUs, ranging from graphics and games, to physically-based simulation, to parallel algorithms and high-performance computing. While a Ph.D. student at The University of North Carolina he recognized a nascent trend and coined a name for it: GPGPU (General-Purpose computing on Graphics Processing Units).

Follow @harrism on Twitter

Posts by Mark Harris

Data Science May 02, 2025

An Even Easier Introduction to CUDA (Updated)

Note: This blog post was originally published on Jan 25, 2017, but has been edited to reflect new updates. This post is a super simple introduction to CUDA, the... 16 MIN READ

Data Science Feb 10, 2022

Implementing High-Precision Decimal Arithmetic with CUDA int128

“Truth is much too complicated to allow anything but approximations.” -- John von Neumann The history of computing has demonstrated that there is no limit... 19 MIN READ

Image depicting NVIDIA CEO Jen-Hsun Huang explaining the importance of the RAPIDS launch demo at GTC Europe 2018.

Simulation / Modeling / Design Dec 08, 2020

Fast, Flexible Allocation for NVIDIA CUDA with RAPIDS Memory Manager

When I joined the RAPIDS team in 2018, NVIDIA CUDA device memory allocation was a performance problem. RAPIDS cuDF allocates and deallocates memory at high... 24 MIN READ

Data Science Aug 20, 2019

CUDA Pro Tip: The Fast Way to Query Device Properties

This post was updated in April 2025 to reflect performance on current hardware and software. CUDA applications often need to know the maximum available shared... 3 MIN READ

Simulation / Modeling / Design Oct 15, 2018

RAPIDS Accelerates Data Science End-to-End

Today's data science problems demand a dramatic increase in the scale of data as well as the computational power required to process it. Unfortunately, the... 10 MIN READ

Simulation / Modeling / Design Oct 04, 2017

Cooperative Groups: Flexible CUDA Thread Programming

In efficient parallel algorithms, threads cooperate and share data to perform collective computations. To share data, the threads must synchronize. The... 16 MIN READ