Simulation / Modeling / Design

Introducing the NVIDIA OpenACC Toolkit

Programmability is crucial to accelerated computing, and NVIDIA’s CUDA Toolkit has been critical to the success of GPU computing. Over three million CUDA Toolkits have been downloaded since its first launch. However, there are many scientists and researchers yet to benefit from GPU computing. These scientists have limited time to learn and apply a parallel programming language, and they often have huge existing code bases that must remain portable across platforms.

NVIDIA is introducing the new OpenACC Toolkit to help these researchers and scientists achieve science and engineering goals faster.

OpenACC tutorial: Three Steps to More Science

Over the last few years,, OpenACC has established itself as a higher-level approach to GPU acceleration that is simple, powerful, and portable. The membership of the OpenACC organization has grown to include accelerator manufacturers, tool vendors, supercomputing centers, and educational institutions. The OpenACC 2.0 specification significantly expands the functionality and improves the portability of OpenACC and is now available in many commercial tools.

The NVIDIA OpenACC toolkit provides the tools and documentation that scientists and researchers need to be successful with OpenACC. The toolkit includes a free OpenACC compiler for university developers to remove any barriers to use by academics.

The new OpenACC Toolkit includes the following in a single package:

  • PGI Accelerator Fortran/C/C++ workstation compiler suite
  • NVProf Profiler beta
  • GPU-accelerated libraries
  • Code samples and examples
  • Documentation

PGI Accelerator Fortran/C/C++ workstation compiler suite

The toolkit includes a 90-day trial of PGI Accelerator FORTRAN 2003, C11, and C++11 high-performance parallelizing compilers for x64 + accelerator platforms running Linux. PGI Accelerator compilers support the OpenACC 2.0 API.

Qualified university developers can register for a free renewable annual license to the PGI Accelerator compilers. This free university developer license is node-locked and has an OpenMP run-time limit of four threads.

For more information about the capabilities of the free university developer license and commercial PGI licenses, see 3 Steps To More Science.

NVProf Profiler beta

Figure 1: The new version of the `nvprof` profiler included with the NVIDIA OpenACC Toolkit can profile your CPU code to find the hot spots, so you can focus OpenACC efforts on the most valuable parts of your code.
Figure 1. New version of nvprof

The toolkit includes a beta version of the NVProf profiler version 7.5. Of particular interest in this version is a new CPU profiling capability.

The first step to accelerating existing code is to identify which parts of your code can benefit the most from GPU acceleration. NVProf’s new CPU profiling feature shows the percentage of run time spent in each routine by sampling the CPU program counter and call stacks at high frequency, as Figure 1 shows.

The profiler uses the sample data to construct an easy-to-interpret call graph, with nodes representing frames in each call stack, and the fraction of run time spent in each frame.

GPU-accelerated libraries

Adding GPU acceleration to your application can be as easy as substituting a GPU library function call for an existing CPU library call. The OpenACC Toolkit includes an extensive collection of GPU-accelerated libraries. The libraries are available in the linux86-64/2015/CUDA/7.0/lib directory inside the PGI installation directory.

To help you find potential CPU library acceleration opportunities, the toolkit includes the new GPU Wizard. GPU Wizard analyses your code’s execution and lists library replacement opportunities and the potential performance boost the GPU libraries could deliver.

There are GPU-accelerated versions of standard libraries like (MKL) BLAS and FFTW that give significant performance improvements. For example, the nvBLAS library can directly replace the MKL BLAS library and deliver 6x to 17x faster performance, depending on BLAS API usage.

Code samples and examples

The OpenACC Toolkit includes several OpenACC code samples and an OpenACC SDK to illustrate aspects of programming with OpenACC. Find them in the linux86-64/2015/OpenACC/examples directory.


The OpenACC Toolkit provides complete documentation including the following tutorials and guides:

  • OpenACC Toolkit Quick Start Guide: A summary of tools in the toolkit with instructions for setting them up, and getting support
  • OpenACC Toolkit Installation Guide
  • OpenACC Programming and Best Practices Guide: Best practices for porting existing applications to OpenACC
  • NVProf Tutorial for OpenACC Users: A guide to using the CPU profiling feature of NVProf with examples
  • PGI Accelerator Compilers OpenACC Users Guide
  • NVProf Profiler Users Guide
  • GPU Wizard Readme

Download today!

At NVIDIA, we’re committed to helping scientists and researchers become successful with OpenACC, and we plan to continuously add capabilities to the toolkit to make GPU computing simpler. Download the new OpenACC Toolkit and give us feedback.

Discuss (2)