Whenever I hear about a developer interested in accelerating his or her C++ application on a GPU, I make sure to tell them about Thrust. Thrust is a parallel algorithms library loosely based on the C++ Standard Template Library. Thrust provides a number of building blocks, such as sort, scans, transforms, and reductions, to enable developers to quickly embrace the power of parallel computing. In addition to targeting the massive parallelism of NVIDIA GPUs, Thrust supports multiple system back-ends such as OpenMP and Intel’s Threading Building Blocks. This means that it’s possible to compile your code for different parallel processors with a simple flick of a compiler switch.
For this first in a mini-series of screencasts about Thrust, we’ll write a simple sorting program and execute it on both a GPU and a multi-core CPU. In upcoming episodes, we’ll explore more capabilities of Thrust which really show its flexibility and power. For more examples of using Thrust, read the post Expressive Algorithmic Programming with Thrust, and check out the Thrust Quick Start Guide.
To suggest a topic for a future episode of CUDACasts, or if you have any other feedback, please leave a comment below.