GPU Accelerated Computing with C and C++

Using the CUDA Toolkit you can accelerate your C or C++ applications by updating the computationally intensive portions of your code to run on GPUs. To accelerate your applications, you can call functions from drop-in libraries as well as develop custom applications using languages including C, C++, Fortran and Python. Below you will find some resources to help you get started using CUDA.


Install the free CUDA Tookit on a Linux, Mac or Windows system with one or more CUDA-capable GPUs. Follow the instructions in the CUDA Quick Start Guide to get up and running quickly.

Or, watch the short video below and follow along.

If you do not have a GPU, you can access one of the thousands of GPUs available from cloud service providers including Amazon AWS, Microsoft Azure and IBM SoftLayer. The NVIDIA-maintained CUDA Amazon Machine Image (AMI) on AWS, for example, comes pre-installed with CUDA and is available for use today.

For more detailed installation instructions, refer to the CUDA installation guides. For help with troubleshooting, browse and participate in the CUDA Setup and Installation forum.


You are now ready to write your first CUDA program. The article, Even Easier Introduction to CUDA, introduces key concepts through simple examples that you can follow along.

The video below walks through an example of how to write an example that adds two vectors.

The Programming Guide in the CUDA Documentation introduces key concepts covered in the video including CUDA programming model, important APIs and performance guidelines.

NVIDIA also provides hands-on training through a collection of self-paced labs. The labs guide you step-by-step through editing and execution of code, and even interaction with visual tools is all woven together into a simple immersive experience.


Practice the techniques you learned in the materials above through more hands-on labs created for intermediate and advanced users.

The CUDA C Best Practices Guide presents established parallelization and optimization techniques and explains programming approaches that can greatly simplify programming GPU-accelerated applications.

For a more formal, instructor-led introduction to CUDA, explore the Introduction to Parallel Programming on UDACITY. The course covers a series of image processing algorithms such as you might find in Photoshop or Instagram. You'll be able to program and run your assignments on high-end GPUs, even if you don't have one yourself.

Additional Resources

CODE Samples


The CUDA Toolkit is a free download from NVIDIA and is supported on Windows, Mac, and most standard Linux distributions.

So, now you’re ready to deploy your application?
Register today for free access to NVIDIA TESLA GPUs in the cloud.

Latest News

Download DeepStream SDK 2.0 Today to Develop Scalable Video Analytics Applications

NVIDIA has released the DeepStream Software Development Kit (SDK) 2.0 for Tesla GPUs, which is a key part of the NVIDIA Metropolis platform.

A Trio of New Nsight Tools That Empower Developers to Fully Optimize their CPU and GPU Performance

Three big NVIDIA Nsight releases on the same day! NSight Systems is a brand new optimization tool; Nsight Visual Studio Edition 5.6 extends support to Volta GPUs and Win10 RS4; and NSight GRAPHICS 1.2 replaces the current Linux Graphics Debugger.

CUDA 9.2 Now Available

CUDA 9.2 includes updates to libraries, a new library for accelerating custom linear-algebra algorithms, and lower kernel launch latency.

Drink up! Beer Tasting Robot Uses AI to Assess Quality

Can a beer tasting robot do a better job than humans in judging a beer? Researchers in Australia developed a robot that uses machine learning to assess the quality of the beer.

Blogs: Parallel ForAll

Coffee Break Series: Ray Tracing in Games with NVIDIA RTX

Ray tracing will soon revolutionize the way video games look. Ray tracing simulates how rays of light hit and bounce off of objects, enabling developers to create stunning imagery that lives up to the word “photorealistic”.

TensorRT 4 Accelerates Neural Machine Translation, Recommenders, and Speech

NVIDIA has released TensorRT 4 at CVPR 2018.

Accelerate Video Analytics Development with DeepStream 2.0

The sheer scale of the smart city boggles the mind. Tens of billions of sensors will be deployed worldwide, used to make every street, highway, park, airport, parking lot, and building more efficient.

Summit GPU Supercomputer Enables Smarter Science

Today the world of open science received its greatest asset in the form of the Summit supercomputer at Oak Ridge National Laboratory (ORNL).