Drop-in Acceleration on GPUs with Libraries

Access the massively parallel power of a GPU without having to write the GPU code yourself.  Libraries provide highly-optimized algorithms and functions you can incorporate into your new or existing applications.  Many of the GPU-accelerated libraries are designed to very easily replace existing CPU libraries, minimizing the impacts on existing code.

Getting Started

  1. Make sure you have an understanding of what GPU Computing is.
  2. Try CUDA by taking a self-paced lab on nvidia.qwiklab.com. These labs only require a supported web browser and a network that allows Web Sockets. Click here to verify that your network & system support Web Sockets in section "Web Sockets (Port 80)", all check marks should be green.
  3. Download and install the CUDA Toolkit.
  4. Look through the available GPU accelerated libraries and find one that provides functionality you can use.
    • If you are working in C++, you should definitely check out the Thrust parallel template library.

Learning Libraries

  1. Use the following resources to learn how to add library based GPU acceleration to your applications:
    • If you use the FFTW API, NVIDIA provides a drop-in replacement with CUFFT.
    • Look through the CUDA library code samples that come installed with the CUDA Toolkit.
  2. Browse and ask questions on stackoverflow.com or NVIDIA’s DevTalk forum.
  3. Learn more by:

So, now you’re ready to deploy your application?
You can register today to have FREE access to NVIDIA TESLA K40 GPUs.
Develop your codes on the fastest accelerator in the world. Try a Tesla K40 GPU and accelerate your development.


  • The CUDA Toolkit is a free download from NVIDIA and is supported on Windows, Mac, and most standard Linux distributions.  Contained within this toolkit are the following libraries:
    • CUBLAS – an implementation of BLAS (Basic Linear Algebra Subprograms).
    • CUFFT – a Fast Fourier Transform library with support for the FFTW API.
    • CURAND – provides facilities that focus on the simple and efficient generation of high-quality pseudorandom and quasirandom numbers.
    • CUSPARSE – contains a set of basic linear algebra subroutines used for handling sparse matrices.
    • NPP – focuses on imaging and video processing and is widely applicable for developers in these areas.
  • In addition to the libraries provided by NVIDIA, there are a large number of other GPU accelerated libraries available.