With the CUDA Toolkit from NVIDIA, you can accelerate your C or C++ code by moving the computationally intensive portions of your code to an NVIDIA GPU.  In addition to providing drop-in library acceleration, you are able to efficiently access the massive parallel power of a GPU with a few new syntactic elements and calling functions from the CUDA Runtime API.

The CUDA Toolkit from NVIDIA is free and includes:

  • Visual and command-line debugger
  • Visual and command-line GPU profiler
  • Many GPU optimized libraries
  • The CUDA C/C++ compiler
  • GPU management tools
  • Lots of other features

Getting Started:

  1. Make sure you have an understanding of what CUDA is.
    • Read through the Introduction to CUDA C/C++ series on Mark Harris’ Parallel Forall blog.
  2. Try CUDA by taking a self-paced lab on nvidia.qwiklab.com. These labs only require a supported web browser and a network that allows Web Sockets. Click here to verify that your network & system support Web Sockets in section "Web Sockets (Port 80)", all check marks should be green.
  3. Download and install the CUDA Toolkit.
  4. See how to quickly write your first CUDA C program by watching the following video:

Learning CUDA:

  1. Read the An Even Easier Introduction to CUDA blog post on Parallel Forall.
  2. Take the easily digestible, high-quality, and free Udacity Intro to Parallel Programming course which uses CUDA as the parallel programming platform of choice.
  3. Visit docs.nvidia.com for CUDA C/C++ documentation.
  4. Work through hands-on examples:
  5. Look through the code samples that come installed with the CUDA Toolkit.
  6. If you are working in C++, you should definitely check out the Thrust parallel template library.
  7. Browse and ask questions on stackoverflow.com or NVIDIA’s DevTalk forum.
  8. Learn more by:
  9. Look at the following for more advanced hands-on examples:

So, now you’re ready to deploy your application?
You can register today to have FREE access to NVIDIA TESLA K40 GPUs.
Develop your codes on the fastest accelerator in the world. Try a Tesla K40 GPU and accelerate your development.


The CUDA Toolkit is a free download from NVIDIA and is supported on Windows, Mac, and most standard Linux distributions.

  • Starting with CUDA 5.5, CUDA also supports the ARM architecture
  • For the host-side code in your application, the nvcc compiler will use your default host compiler.