Teaching Accelerated CUDA Programming with GPUs

This page is a “Getting Started” guide for educators looking to teach introductory massively parallel programming on GPUs with the CUDA Platform.

The past decade has seen a tectonic shift from serial to parallel computing. No longer the exotic domain of supercomputing, parallel hardware is ubiquitous and software must follow: a serial, sequential program will use less than 1% of a modern PC's computational horsepower and less than 4% of a high-end smartphone. This presents an enormous and critical challenge: we must educate students and programmers to make the most of this new parallel world. Educators have recognized this need; for example, in the 2013 Curriculum guidelines from ACM and IEEE, parallel computing is stressed as one of the most important new fundamentals to be taught.

Getting Started:

Make sure you have an understanding of what CUDA is.
Teach yourself how to accelerate code on GPUs by visiting some or all of GPU Libraries, CUDA C/C++, CUDA Python, or CUDA Fortran.
Visit our CUDA Education Resources page for Power Point slides, code samples, and other material.
Look at the available textbooks such as:
- CUDA by Example: An Introduction to General-Purpose GPU Programming
  After a concise introduction to the CUDA platform and architecture, as well as a quick-start guide to CUDA C, the book details the techniques and trade-offs associated with each key CUDA feature.
- The CUDA Handbook: A Comprehensive Guide to GPU Programming
  The CUDA Handbook begins where CUDA by Example leaves off, discussing CUDA hardware and software in greater detail and covering both CUDA 5.0 and Kepler.
- Parallel algorithms books such as An Introduction to Parallel Programming.
- There are also many older parallel algorithm books that are relevant today, though since they predate CUDA their examples and problems would have to be translated.
  - Introduction to Parallel Algorithms
  - Efficient Parallel Algorithms
- Also of note is the Parallel and Distributed Computing PRAM Algorithms (PDF)
Apply for access to the CUDA Cloud Training Platform. The systems available are:
- NVIDIA Training Cluster – a queue based cluster containing the latest Tesla-class GPUs publically available. Intended for multi-week classes or workshops.
- Amazon Interactive Instances – Easily get your students connected to a GPU Instance in the Amazon Cloud. Intended for single-day classes or workshops.
- Self-paced labs requiring only a supported browser hosted on nvidia.qwiklab.com. Great for prerequisite work, homework, or in-class exercises.

Other Resources:

Free self-paced online courses:
- Intro to Parallel Programming on Udacity.com
- Heterogeneous Parallel Programming on Coursera.com
Participate in sessions, tutorials, and Birds-Of-a-Feather meetings at supercomputing conferences such as SC14, ISC14, SIGCSE 2014, and many smaller conferences.
NVIDIA’s GPU Technology Conference – over 100s of recorded sessions, GTC Express Webinars, and the GTC 2014 conference.
Attend local GPU or accelerated computing Meetups.
Additional course materials made available online by various Universities.