In the last episode of CUDACasts, we wrote our first accelerated program using CUDA C. In this episode, we will explore an alternate method of accelerating code by using OpenACC directives. These directives give hints to the compiler on how to accelerate sections of code, without having to write CUDA code or change the underlying source.
The algorithm we’ll be accelerating is the Jacobi iteration; you can get a copy of the OpenACC accelerated code from GitHub.
The video presents the typical process for accelerating code with OpenACC.
- Identify the computationally intensive sections of code you want to offload to the massively parallel GPU.
- Using OpenACC directives, move parallel loop execution to the GPU and verify it is functionally correct.
- Optimize any data movement between the host and device.
For a more in-depth look at OpenACC and the example shown here, you might want to read these past Parallel Forall blog posts (#1, #2, #3). If you would like to try OpenACC for yourself, you can get a free 30-day trial of the PGI compiler with OpenACC support here.
To request a topic for a future episode of CUDACasts, or if you have any other feedback, please leave a comment to let us know!