GPU Accelerated Computing with C and C++

Using the CUDA Toolkit you can accelerate your C or C++ applications by updating the computationally intensive portions of your code to run on GPUs. To accelerate your applications, you can call functions from drop-in libraries as well as develop custom applications using languages including C, C++, Fortran and Python. Below you will find some resources to help you get started using CUDA.


Install the free CUDA Tookit on a Linux, Mac or Windows system with one or more CUDA-capable GPUs. Follow the instructions in the CUDA Quick Start Guide to get up and running quickly.

Or, watch the short video below and follow along.

If you do not have a GPU, you can access one of the thousands of GPUs available from cloud service providers including Amazon AWS, Microsoft Azure and IBM SoftLayer. The NVIDIA-maintained CUDA Amazon Machine Image (AMI) on AWS, for example, comes pre-installed with CUDA and is available for use today.

For more detailed installation instructions, refer to the CUDA installation guides. For help with troubleshooting, browse and participate in the CUDA Setup and Installation forum.


You are now ready to write your first CUDA program. The article, Even Easier Introduction to CUDA, introduces key concepts through simple examples that you can follow along.

The video below walks through an example of how to write an example that adds two vectors.

The Programming Guide in the CUDA Documentation introduces key concepts covered in the video including CUDA programming model, important APIs and performance guidelines.


NVIDIA provides hands-on training in CUDA through a collection of self-paced and instructor-led courses. The self-paced online training, powered by GPU-accelerated workstations in the cloud, guides you step-by-step through editing and execution of code along with interaction with visual tools. All you need is a laptop and an internet connection to access the complete suite of free courses and certification options.

The CUDA C Best Practices Guide presents established parallelization and optimization techniques and explains programming approaches that can greatly simplify programming GPU-accelerated applications.

Additional Resources

CODE Samples


The CUDA Toolkit is a free download from NVIDIA and is supported on Windows, Mac, and most standard Linux distributions.

So, now you’re ready to deploy your application?
Register today for free access to NVIDIA TESLA GPUs in the cloud.

Latest News

Microsoft and NVIDIA Announce June Preview for GPU-Acceleration Support for WSL

Microsoft announced a Public Preview for GPU in Windows Subsystem for Linux (WSL). WSL is a layer that enables executing Linux binaries on Microsoft Windows computing systems.

Nsight Developer Tools Unleash Performance Advantages of NVIDIA Ampere Architecture

To help unleash the performance advantages of the NVIDIA Ampere Architecture, the CUDA Toolkit 11 and Nsight Systems 2020.3 and Nsight Compute 2020.1 developer tools have been enhanced and scheduled for general availability at the end of May.

NVIDIA Announces CUDA Toolkit 11

CUDA 11 provides support for the new NVIDIA A100 based on the NVIDIA Ampere architecture, Arm server processors, performance-optimized libraries, and new developer tools and improvements for A100.

Developer Blog: Introducing Low-Level GPU Virtual Memory Management

CUDA 10.2 introduces a new set of API functions for virtual memory management. In this post, we explain how to use the new API functions and go over some real-world application use cases.

Blogs: Parallel ForAll

Detecting Rotated Objects Using the NVIDIA Object Detection Toolkit

Getting a Real Time Factor Over 60 for Text-To-Speech Services Using NVIDIA Jarvis

Building an Intelligent Robot Dog with the NVIDIA Isaac SDK

Accelerating Deep Learning Research in Medical Imaging Using MONAI