CUDACasts Episode #2: Your First CUDA C Program

In the last episode of CUDACasts, we learned how to install the CUDA Toolkit on Windows. Now we’re going to quickly move on and accelerate code on the GPU. For this episode, we’ll use the CUDA C programming language. However, as I will show in future CUDACasts, there are other CUDA enabled languages, including C++, Fortran, and Python.

The simple code we’ll be writing is a kernel called VectorAdd, which adds two vectors, a and b, in parallel, and stores the results in vector c. You can follow along in the video or download the source code for this episode from Github.

The process for moving VectorAdd from the CPU to the massively parallel GPU follows three simple steps.

Parallelize the VectorAdd function by converting the serial for loop that adds each pair of elements on the CPU into a parallel kernel that uses an independent GPU thread to add each pair of elements.
Copy the initialized data from CPU memory to the GPU memory space and the results back.
Modify the VectorAdd function call to launch the now parallelized kernel on the GPU.

If you’re interested in learning more about CUDA C, you can watch my in-depth Introduction to CUDA C/C++ recorded here. In the next CUDACast, we’ll explore an alternate method for accelerating code using the OpenACC directive based approached.

If you would like to request a topic for a future episode of CUDACasts, or if you have any other feedback, please leave a comment to let us know!

CUDACasts Episode #2: Your First CUDA C Program

Related resources

Tags

About the Authors

CUDACasts Episode #2: Your First CUDA C Program

Related resources

Tags

About the Authors

Comments

Related posts

CUDA Refresher: Getting started with CUDA

An Even Easier Introduction to CUDA

CUDACasts Episode #3: Your First OpenACC Program

An Easy Introduction to CUDA C and C++

An Easy Introduction to CUDA Fortran

Related posts

CUDACasts Episode 21: Porting a simple OpenCV sample to the Jetson TK1 GPU

CUDACasts Episode 20: Getting started with Jetson TK1 and OpenCV

CUDACasts Episode 19: CUDA 6 Guided Performance Analysis with the Visual Profiler

CUDACasts Episode 18: CUDA 6.0 Unified Memory

CUDACasts Episode 17: Unstructured Data Lifetimes in OpenACC 2.0