
This page has online courses to help you get started programming or teaching CUDA as well as links to Universities teaching CUDA.
This page organized into three sections to get you started
Introductory CUDA Technical Training Courses
-
Volume I: Introduction to CUDA Programming
- Exercises (for Linux and Mac)
- Visual Studio Exercises (for Windows)
- Instructions for Exercises
- Volume II: CUDA Case Studies
CUDAcasts - Downloadable CUDA Training Podcasts
- Introduction to GPU Computing
- CUDA Programming Model Overview
- CUDA Programming Basics - Part I
- CUDA Programming Basics - Part II
Follow this link for additional GPU Computing Online Seminars
CUDA University Courses
University of Illinois : ECE 498AL
Taught by Professor Wen-mei W. Hwu and David Kirk, NVIDIA CUDA Scientist.
- Introduction to GPU Computing (60.2 MB)
- CUDA Programming Model (75.3 MB)
- CUDA API (32.4 MB)
- Simple Matrix Multiplication in CUDA (46.0 MB)
- CUDA Memory Model (109 MB)
- Shared Memory Matrix Multiplication (81.4 MB)
- Additional CUDA API Features (22.4 MB)
- Useful Information on CUDA Tools (15.7 MB)
- Threading Hardware (140 MB)
- Memory Hardware (85.8 MB)
- Memory Bank Conflicts (115 MB)
- Parallel Thread Execution (32.6 MB)
- Control Flow (96.6 MB)
- Precision (137 MB)
These classes are each downloadable CUDAcasts with video pre-scaled to be compatible with major players.
All PowerPoint class presentations can be found on the course syllabus: ECE 498AL
Stanford University: NVIDIA Short Course on High Performance Computing with CUDA Taught by NVIDIA Dev Tech Team. (Links to Flash video provided below, for Sliverlight visit the course home Page)
- Introduction to High Performance Computing with CUDA.
- Introduction to High Performance Computing with CUDA. (CUDA C Basics 2)
- Fundamental Optimizations 1 - Global Memory
- Fundamental Optimizations 2 - Shared Memory
- Finite Difference Stencils on Regular Grids
- Determining Kernel Performance Limiters
Stanford University: CS193G
Taught by Jared Hoberock and David Tarjan
- Introduction to Massively Parallel Computing
- GPU History and CUDA Programming Basics
- CUDA Treads and Atomics
- CUDA Memories
- Performance Considerations
- Parallel Patterns I
- Parallel Patterns II
- Introduction to Thrust
- Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors
- PDE Solvers
- The Fermi Architecture
- Ray Tracing Case Study
- Future of Throughput
- Path Planning Case Study
- Optimizing GPU Performance
- Final lecture TBD
PowerPoint versions of these presentations can be found here.
CS193G Assignments
CS193G Tutorials
UC Davis: EE171, Parallel Computer Architecture
Taught by John Owens, Associate Professor
University of Wisconsin, Madison: ME964,High Performance Computing for Engineering Applications
Taught by Dan Negrut, Assistant Professor
University of North Carolina Charlotte (UNCC): SIGCSE 2011 Workshop: General purpose computing using GPUs: Developing a hands on undergraduate course on CUDA programming work
Taught by Barry Wilkinson & Yaohang Li
ITCS 6010/8010 Topics in Computer Science: GPU Programming for High Performance Computing (CUDA programming)
Taught by Barry Wilkinson
Universities teaching CUDA where you can apply to enroll or register for courses.
CUDA Seminars and Tutorials
- GPU Technology Conference: search for recordings
- SC10
- SC09
- SC08 Tutorial: High Performance Computing with CUDA
- SC07 Tutorial: High Performance Computing with CUDA
-
NVISION 08 Tutorials
- Getting Started with CUDA (covers CUDA programming model, basics of CUDA programming, and BLAS and FFT libraries)
- Advanced CUDA Training (covers 10-series architecture and optimization techniques using particle simulation and finite difference case studies)
- All presentations from NVISION 08
- ISC 2008 Case Study: Computational Fluid Dynamics (CFD)
CUDA Consultants and Training Services
Dr Dobbs Article Series
- CUDA, Supercomputing for the Masses: Part 1 : CUDA lets you work with familiar programming concepts..
- CUDA, Supercomputing for the Masses: Part 2 : A first kernel
- CUDA, Supercomputing for the Masses: Part 3 : Error handling and global memory performance limitations
- CUDA, Supercomputing for the Masses: Part 4 : Understanding and using shared memory (1)
- CUDA, Supercomputing for the Masses: Part 5 : Understanding and using shared memory (2)
- CUDA, Supercomputing for the Masses: Part 6 : Global memory and the CUDA profiler
- CUDA, Supercomputing for the Masses: Part 7 : Double the fun with next-generation CUDA hardware
- CUDA, Supercomputing for the Masses: Part 8 : Using libraries with CUDA
- CUDA, Supercomputing for the Masses: Part 9 : Extending High-level Languages with CUDA
- CUDA, Supercomputing for the Masses: Part 10 : CUDPP, a powerful data-parallel CUDA library
- CUDA, Supercomputing for the Masses: Part 11 : Revisiting CUDA memory spaces
- CUDA, Supercomputing for the Masses: Part 12 : CUDA 2.2 changes the data movement paradigm
- CUDA, Supercomputing for the Masses: Part 13 : Using texture memory in CUDA
- CUDA, Supercomputing for the Masses: Part 14 : Debuging CUDA and using CUDA-GDB
- CUDA, Supercomputing for the Masses: Part 15 : Using Pixel Buffer Objects with CUDA and OpenGL
- CUDA, Supercomputing for the Masses: Part 16 : CUDA 3.0 provides expanded capabilities
- CUDA, Supercomputing for the Masses: Part 17 : CUDA 3.0 provides expanded capabilities and makes development easier
- CUDA, Supercomputing for the Masses: Part 18 : Using Vertex Buffer Objects with CUDA and OpenGL
- CUDA, Supercomputing for the Masses: Part 19 : Parallel Nsight Part 1: Configuring and Debugging Applications
- CUDA, Supercomputing for the Masses: Part 20 : Parallel Nsight Part 2: Using the Parallel Nsight Analysis capabilities
- CUDA, Supercomputing for the Masses: Part 21 : The Fermi architecture and CUDA



Registered Developers Website
NVDeveloper (old site)