The rate of scientific discovery is speeding up every day with the use of advancing technologies like GPUs: scientific results are being published faster than ever before.
This three-step tutorial is designed to show you how to take advantage of compilers and libraries to quickly accelerate your codes with GPUs so that you can spend more time on real breakthroughs.
All the tools mentioned are freely available as part of the PGI Community Edition.
Includes OpenACC Compilers, Tools and GPU-Accelerated Libraries.
OpenACC is a directives-based programming approach to parallel computing designed for performance and portability on CPUs and GPUs. Scientists report 2-10X performance increase with as little as a few weeks effort.
Here are three simple steps to start accelerating your code with GPUs. We will be using PGI OpenACC compiler for C, C++, FORTRAN, along with tools from the PGI Community Edition.
Start by analyzing code using profiling tools to identify functions and loops that will run faster on GPUs. A generated baseline CPU profile shows where an executable is spending the most time. Check if some operations identified by the profiler have been already accelerated on GPUs through existing GPU libraries and then proceed with OpenACC directives.
Now we can begin exposing parallelism starting with the functions and loops that take the most time on a CPU. OpenACC compiler will run GPU parts of the code identified by directives or pragmas. Use #pragma acc parallel to initiate parallel execution, #pragma acc kernel and loop to execute a kernel or surrounding loops on a GPU.
Optimizing data movements can bring a significant performance increase. Use loop optimizations to achieve even faster results. Note that if you use a Pascal GPU, data movements will be performed by the GPU itself without a need to add additional directives.