Python is one of the most popular programming languages today for science, engineering, data analytics and deep learning applications. However, as an interpreted language, it has been considered too slow for high-performance computing.

Numba, a Python compiler from Anaconda that can compile Python code for execution on CUDA-capable GPUs, provides Python developers with an easy entry into GPU-accelerated computing and a path for using increasingly sophisticated CUDA code with a minimum of new syntax and jargon. With CUDA Python and Numba, you get the best of both worlds: rapid iterative development with Python combined with the speed of a compiled language targeting both CPUs and NVIDIA GPUs.


To run CUDA Python, you will need the CUDA Toolkit installed on a system with CUDA capable GPUs. Use this guide for easy steps to install CUDA. If you do not have a CUDA-capable GPU, you can access one of the thousands of GPUs available from cloud service providers including Amazon AWS, Microsoft Azure and IBM SoftLayer. The NVIDIA-maintained CUDA Amazon Machine Image (AMI) on AWS, for example, comes pre-installed with CUDA and is available for use today.

To get started with Numba, the first step is to download and install the Anaconda python distribution that includes many popular packages (Numpy, Scipy, Matplotlib, iPython, etc) and “conda”, a powerful package manager. Once you have Anaconda installed, install the required CUDA packages by typing conda install numba cudatoolkit pyculib.  


The blog post Numba: High-Performance Python with CUDA Acceleration is a great resource to get you started. Also refer to the Numba tutorial for CUDA on the ContinuumIO github repository and the Numba posts on Anaconda’s blog.

If you are new to Python, explore the beginner section of the Python website for some excellent getting started resources. The blog, An Even Easier Introduction to CUDA, introduces key CUDA concepts through simple examples.

Check out Numbas github repository for additional examples to practice. You can also get the full Jupyter Notebook for the Mandelbrot example on Github.


The developer blog posts, Seven things you might not know about Numba and GPU-Accelerated Graph Analytics in Python with Numba provide additional insights into GPU Computing with python.

NVIDIA also provides hands-on training through a collection of self-paced courses and instructor-led workshops. The courses guide you step-by-step through editing and execution of code and interaction with visualization tools, woven together into a simple immersive experience. Practice the techniques you learned in the materials above through hands-on content.


You can register for free access to NVIDIA TESLA GPUs in the cloud to deploy your python applications once they are ready.

Latest News

SONY Breaks ResNet-50 Training Record with NVIDIA V100 Tensor Core GPUs

Researchers from SONY today announced a new speed record for training ImageNet/ResNet 50 in only 224 seconds (three minutes and 44 seconds) with 75 percent accuracy using 2,100 NVIDIA Tesla V100 Tensor Core GPUs.

AI Research Detects Glaucoma with 94 Percent Accuracy

Glaucoma affects more than 2.7 million people in the U.S. and is one of the leading causes of blindness in the world.

AI Study Predicts Alzheimer’s Six Years Before Diagnosis

A new study published in Radiology describes how deep learning can improve the ability of brain imaging to predict Alzheimer’s disease years before an actual diagnosis.

Visualizing Star Polymers in Record Time

In the last five minutes, you have probably come into contact with more polymers than you can count. In fact, they are everywhere; in grocery bags,  water bottles, phones, computers, food packaging, auto parts, tires, airplanes, and toys.

Blogs: Parallel ForAll

Getting Started with PGI Compilers on AWS

PGI Community Edition compilers and tools for Linux/x86-64 provide a low-cost option for people interested in GPU-accelerated computing.

Kubernetes For AI Hyperparameter Search Experiments

The software industry has recently seen a huge shift in how software deployments are done thanks to technologies such as containers and orchestrators.

CatBoost Enables Fast Gradient Boosting on Decision Trees Using GPUs

Machine Learning techniques are widely used today for many different tasks. Different types of data require different methods.

NVIDIA Jetson AGX Xavier Delivers 32 TeraOps for New Era of AI in Robotics

The world’s ultimate embedded solution for AI developers, Jetson AGX Xavier, is now shipping as standalone production modules from NVIDIA.