GPU Accelerated Computing with C and C++

Using the CUDA Toolkit you can accelerate your C or C++ applications by updating the computationally intensive portions of your code to run on GPUs. To accelerate your applications, you can call functions from drop-in libraries as well as develop custom applications using languages including C, C++, Fortran and Python. Below you will find some resources to help you get started using CUDA.


Install the free CUDA Tookit on a Linux, Mac or Windows system with one or more CUDA-capable GPUs. Follow the instructions in the CUDA Quick Start Guide to get up and running quickly.

Or, watch the short video below and follow along.

If you do not have a GPU, you can access one of the thousands of GPUs available from cloud service providers including Amazon AWS, Microsoft Azure and IBM SoftLayer. The NVIDIA-maintained CUDA Amazon Machine Image (AMI) on AWS, for example, comes pre-installed with CUDA and is available for use today.

For more detailed installation instructions, refer to the CUDA installation guides. For help with troubleshooting, browse and participate in the CUDA Setup and Installation forum.


You are now ready to write your first CUDA program. The article, Even Easier Introduction to CUDA, introduces key concepts through simple examples that you can follow along.

The video below walks through an example of how to write an example that adds two vectors.

The Programming Guide in the CUDA Documentation introduces key concepts covered in the video including CUDA programming model, important APIs and performance guidelines.


NVIDIA provides hands-on training in CUDA through a collection of self-paced and instructor-led courses. The self-paced online training, powered by GPU-accelerated workstations in the cloud, guides you step-by-step through editing and execution of code along with interaction with visual tools. All you need is a laptop and an internet connection to access the complete suite of free courses and certification options.

The CUDA C Best Practices Guide presents established parallelization and optimization techniques and explains programming approaches that can greatly simplify programming GPU-accelerated applications.

Additional Resources

CODE Samples


The CUDA Toolkit is a free download from NVIDIA and is supported on Windows, Mac, and most standard Linux distributions.

So, now you’re ready to deploy your application?
Register today for free access to NVIDIA TESLA GPUs in the cloud.

Latest News

Developer Spotlight: Visualizing High-Resolution Atomic Structures to Simulate Molecular Dynamics

Experimental sciences deliver high-resolution atomic structures for biological complexes, but researchers need to refine those structures, prove their accuracy, and simulate their dynamics while retaining all of the information that makes simulati

Researchers at VideoGorillas Use AI to Remaster Archived Content to 4K Resolution and Above

To meet the growing pace of innovation, one company is developing a new AI-enhanced solution to exceed visual expectations at lower costs.

Deep Learning Helps UCLA Scientists Identify Cancer Cells in the Blood Instantaneously

UCLA researchers have just developed a deep learning, GPU-powered device that can detect cancer cells in a few milliseconds, hundreds of times faster than previous methods.

CUDA 10.1 Update 2 Now Available

CUDA 10.1 Update 2 is now available for download. This version is a compatible update to CUDA 10.1 and includes updates to libraries, developer tools and bug fixes.

Blogs: Parallel ForAll

Using Nsight Compute to Inspect your Kernels

By now, hopefully you read the first two blogs in this series “Migrating to NVIDIA Nsight Tools from NVVP and Nvprof” and “Transitioning to Nsight Systems from NVIDIA Visual Profiler / nvprof,” and you’ve discovered NVIDIA added a few new tools, b

Neural Modules for Fast Development of Speech and Language models

As a researcher building state-of-the-art speech and language models, you need to be able to quickly experiment with novel network architectures.

NVDLA Deep Learning Inference Compiler is Now Open Source

Designing new custom hardware accelerators for deep learning is clearly popular, but achieving state-of-the-art performance and efficiency with a new design is a complex and challenging problem.

Generate Natural Sounding Speech from Text in Real-Time

This blog, intended for developers with professional level understanding of Deep Learning, will help you produce a production ready AI text-to-speech model.