GPU Accelerated Computing with Python

Python is one of the most popular programming languages today for science, engineering, data analytics and deep learning applications. However, as an interpreted language, it has been considered too slow for high-performance computing. That has changed with CUDA Python from Continuum Analytics.

With CUDA Python, using the Numba Python compiler, you get the best of both worlds: rapid iterative development with Python combined with the speed of a compiled language targeting both CPUs and NVIDIA GPUs.


To run CUDA Python, you will need the CUDA Toolkit installed on a system with CUDA capable GPUs.

If you do not have a CUDA-capable GPU, you can access one of the thousands of GPUs available from cloud service providers including Amazon AWS, Microsoft Azure and IBM SoftLayer. The NVIDIA-maintained CUDA Amazon Machine Image (AMI) on AWS, for example, comes pre-installed with CUDA and is available for use today.

Use this guide for easy steps to install CUDA. To setup CUDA Python, first install the Anaconda python distribution. Then install the latest version of the Numba package. You can find detailed installation instructions in the Numba documentation.

Or, watch the short video below and follow along.


You are now ready for your first python program on the GPU. The video below walks through a simple example that adds two vectors for you to follow along.

If you are new to Python, explore the beginner section of the Python website for some excellent getting started resources. The blog, An Even Easier Introduction to CUDA, introduces key CUDA concepts through simple examples.

In the Numba documentation you will find information about how to vectorize functions to accelerate them automatically as well as how to write CUDA code in Python. Download and execute Jupyter Notebooks for the Mandelbrot and Monte Carlo Option Pricer examples on your local machine.


Check out Numbas github repository for additional examples to practice.

NVIDIA also provides hands-on training through a collection of self-paced labs . The labs guide you step-by-step through editing and execution of code, and even interaction with visual tools is all woven together into a simple immersive experience. Practice the techniques you learned in the materials above through hands-on labs.

For a more formal,instructor-led introduction to CUDA, explore the Introduction to Parallel Programming on UDACITY. The course covers a series of image processing algorithms such as you might find in Photoshop or Instagram. You'll be able to program and run your assignments on high-end GPUs, even if you don't have one yourself.


The Numba package is available as a Continuum Analytics sponsored open-source project.

The CUDA Toolkit is a free download from NVIDIA and is supported on Windows, Mac, and most standard Linux distributions.

So, now youre ready to deploy your application?

Register today for free access to NVIDIA TESLA GPUs in the cloud.

Latest News

Download DeepStream SDK 2.0 Today to Develop Scalable Video Analytics Applications

NVIDIA has released the DeepStream Software Development Kit (SDK) 2.0 for Tesla GPUs, which is a key part of theĀ NVIDIA Metropolis platform.

A Trio of New Nsight Tools That Empower Developers to Fully Optimize their CPU and GPU Performance

Three big NVIDIA Nsight releases on the same day! NSight Systems is a brand new optimization tool; Nsight Visual Studio Edition 5.6 extends support to Volta GPUs and Win10 RS4; and NSight GRAPHICS 1.2 replaces the current Linux Graphics Debugger.

CUDA 9.2 Now Available

CUDA 9.2 includes updates to libraries, a new library for accelerating custom linear-algebra algorithms, and lower kernel launch latency.

Drink up! Beer Tasting Robot Uses AI to Assess Quality

Can a beer tasting robot do a better job than humans in judging a beer? Researchers in Australia developed a robot that uses machine learning to assess the quality of the beer.

Blogs: Parallel ForAll

TensorRT 4 Accelerates Neural Machine Translation, Recommenders, and Speech

NVIDIA has released TensorRTĀ 4 at CVPR 2018.

Accelerate Video Analytics Development with DeepStream 2.0

The sheer scale of the smart city boggles the mind. Tens of billions of sensors will be deployed worldwide, used to make every street, highway, park, airport, parking lot, and building more efficient.

Summit GPU Supercomputer Enables Smarter Science

Today the world of open science received its greatest asset in the form of the Summit supercomputer at Oak Ridge National Laboratory (ORNL).

Accelerating Large-Scale Object Detection with TensorRT

Detecting the presence of humans accurately is critical to a variety of applications, ranging from medical monitoring in nursing homes to large-scale video analytics in various environments.