The OpenACC Toolkit from NVIDIA offers scientists and researchers a simple way to accelerated scientific computing without significant programming effort. Simply insert hints (or “directives”) in C or Fortran code and the OpenACC compiler runs the code on the GPU.
LS-DALTON: Benchmark on Oak Ridge Titan Supercomputer, AMD CPU vs Tesla K20X GPU. Test input: Alanine-3 on CCSD(T) module.
Additional information: COSMO .
NICAM: Benchmark on TiTech TSUBAME 2.5, Westmere CPU vs. K20X, additional information here
"OpenACC makes GPU computing approachable for domain scientists. Initial OpenACC implementation required only minor effort, and more importantly,no modifications of our existing CPU implementation"Janus Juul Eriksen, PhD Fellow, qLEAP Center for Theoretical Chemistry, Aarhus University
The toolkit includes a complete set of developer tools designed to provide significant application acceleration with a minimum amount of coding. It features the popular PGI Accelerator Fortran/C Workstation Compiler Suite for Linux, which supports OpenACC 2.0. The compiler is available at no cost for academia. Non-academic developers will receive a free 90-day trial.
Other tools include:
Every registered toolkit user also receives two free 90-minute, on-demand training sessions to quickly learn and master OpenACC techniques.
* A Free University Developer license is a special single-user node-locked license to the 64-bit Linux version of PGI Accelerator Fortran/C/C++ Workstation™
With OpenACC, programmer keeps the existing code intact and delivers faster performance when an accelerator is available in the system. The example below shows how OpenACC extends existing serial CPU code or parallel code using approaches like OpenMP.
OpenACC is designed to deliver powerful performance that is portable across many types of platforms such as GPUs and multi-core CPUs. Performance portability allows researchers to optimize their code just once and expect accelerated results on different processors and platforms. PGI OpenACC compiler can now accelerate code on x86 multi-core CPUs as well as on GPUs. When a GPU is not present the compiler parallelizes for CPU cores resulting in many times faster performance over single CPU core.
This CPU portability feature is in private beta today and is planned for wider availability in the fourth quarter of 2015.