New NVIDIA Technical Webinars registration links and recent downloads now listed at:

GTC Express Webinars

Parallel programming has never been this easy.  The CUDA programming model, tools and powerful libraries have provided the foundation - this webinar series will fuel your development. Get trained directly from GPU Computing experts in CUDA, OpenCL and DirectCompute, find out about the latest developments from companies around the world leading the GPU Computing revolution.

Advance registration is required.  You will be kept informed of updates, future webinars and added to our CUDA Newsletter mailing list and invited to become a registered developer. 

Additional Parallel Nsight and Tools Webinar Records  GTC Express Webinar Series

Archived Webinars 

Webinar Title Links to recordings 
Understanding OpenACC Directives by CAPS
In this webinar you will refine your knowledge of OpenACC directive programming and how you can get better acceleration for real scientific applications.

Presented by CAPS, one of the leading experts in parallel computing.

Video(mp4)
OpenACC For Cray Supercomputers
In this webinar you will get an overview of OpenACC support in Cray's compilers and an insight into one of the most easy to use solutions for high performance parallel computing. Presented by Cray, one of the leading experts in supercomputing
Video(mp4)
OpenACC Acceleration for Real Science - using CAPS HMPP
In this webinar you will refine your knowledge of OpenACC directive programming and how you can get better acceleration for real scientific applications. Presented by CAPS, one of the leading experts in parallel computing.
Video(mp4
Getting the most of OpenACC Directives - Optimization with PGI Accelerator
In this webinar you will refine your knowledge of OpenACC directive programming and how you can get better acceleration using some of the more advanced optimization options offered by the OpenACC standard and supported by PGI Accelerator. 
Video(mp4
OpenACC and the New PGI Accelerator
Presented by Senior Technologists from  The Portland Group, In this webinar you will learn about OpenACC Directives, the easy way to express parallelism in your code using the latest version of PGI Accelerator. The PGI  Accelerator Programming Model  uses directives and compiler analysis to compile for the GPU; this often allows you to maintain a single source version, since ignoring the directives will compile the same program for the X64 CPU, preserving your software development investment. 
Video(mp4)
Changing The Paradigm of Higher- Performance Computing With GPU-Based Analytics by Fuzzy Logix
In today’s global economy, it is becoming more and more evident that in order to remain competitive, enterprises need to analyze large volume of data and at the same time perform complex computations in real-time. There are many applications that require real-time or near real-time response with large volume of data such as risk management, high frequency trading, fraud identification and online advertising.
To address these needs, Fuzzy Logix has developed a high performance analytics library ‘Tanay ZX Lib’ powered by NVIDIA’s Tesla GPUs. Tanay ZX Lib includes a library of 300+ massively parallelized functions in the areas of Mathematical and Statistical analysis, Data Mining, Monte Carlo Simulation and Financial Engineering.
Video(mp4)
GPU-Accelerated Quant Finance: The Way Forward
In this webinar, Gerald A. Hanweck, Jr., Ph.D., CEO and Founder of Hanweck Associates, LLC, will discuss first-hand experiences using NVIDIA CUDA and GPUs to accelerate quantitative financial computation, which include 100x faster Monte Carlo simulations, 70x speedup in stochastic volatility models, and the pricing of complex derivatives in real time.
Coming Soon
CUDA Concurrency & Streams
Presented by Steven Rennich, member of our NVIDIA Developer Technology team. Lean about the performance impact and how to achieve maxiumum concurrency in your CUDA applications.

Video(mp4)

PDF

Debugging CUDA with TotalView
With Totalview 8.9.2 and the NVIDIA CUDA add-on, you can debug both the CPU and the GPU code in applications that use CUDA. You can set breakpoints, step, and dive in code running on the CUDA device using all the familiar TotalView GUI methods. TotalView supports unified virtual addressing, as well as multi-device debugging, handles CUDA function in-lining and provides type qualification in the expression system. You can display how your logical threads are being mapped to hardware and navigate kernel threads using either hardware or logical coordinates.

The webinar will also preview the upcoming TotalView 8.10 with support for CUDA 4.1

Links Coming Soon
OpenACC 1.0 - Technical Overview
The new standard for compiler directives for parallel programming. Enabling GPU Acceleration with just hours of programming effort. OpenACC is a major new initiative providing an open standard for compiler hints or directives, the easist way to leverage the performace of any parallel computers. This webinar will provide a technical overview of the OpenACC API 1.0 specification. Featuring a live Q&A session with members of the OpenACC board.

Video(mp4)

PDF

CUDA Toolkit 4.1 - Technical & Performamce Overview
Now in production release features many improvements including:
New LLVM Based Compiler, over 1000 new image processing functions and major improvements in the Visual Profiler and much more. Presented by NVIDIA's CUDA PM, featuring a live Q&A session.
Video(mp4)
CUDA X86 - Running your CUDA Code on multi-core CPUs
PGI's CUDA X86 compiler enables developers to create a single code base using CUDA C/C++ optimized for parallel execution on systems with and without GPU Computing acceleration. This webinar will provide an overview and insight into this powerful solution.
Featuring a live Q&A session and technical presenters from the Portland Group
Video(mp4)
5x in 5 Hours: Accelerating SEISMIC_CPML Using High-level GPU Programming
Programming GPU accelerators involves 3 basic aspects: splitting the source code between host and GPU, managing data allocation and movement between host memory and GPU memory, and optimizing GPU kernels. Much of this process can be automated using modern compiler technology and high level programming techniques. In this webinar, Mat Colgrove, Applications Engineer for The Portland Group, will present a case study on using PGI Accelerator compiler directives to achieve a 5x speed-up in approximately 5 hours of programming time on this popular geophysics code.

Video(mp4

PDF

Code
 

Debugging workshop for CUDA 4.1 using Allinea DDT
Presented by David Lecomber,  CTO Allinea Software, Learn how Allinea DDT for CUDA can address your GPU debugging requirements. See how easy and powerful it is to debug your code on the host CPU and CUDA boards.
The agenda for this meeting is to
A) Overview of the capabilities of the DDT debugger
B) Multiple use-cases to highlight how DDT can improve programmer productivity
C) Mini training on how to install, configure and run the debugger 

With Live Q&A.

Links Coming Soon
CUDA Toolkit 4.1 Feature Overview
Now available for all developers, this new release features 3 major improvements:
New LLVM Based Compiler, over 1000 new image processing functions and major improvements in the Visual Profiler and much more. Presented by NVIDIA's CUDA PM, featuring Q&A session. Updated Dec 2011

Video(mp4)

PDF

Heterogeneous Data-Parallel Programming
Presented by Satnam Singh, Professor of Reconfigurable Computing, School of Computer Science, University of Birmingham (UK)
Easy and effieicient data-parallel programming will be essential for getting performance from today's massively parallel systems. In this talk Dr Singh will share his vision and demostrate a novel Microsoft Accelerator System, a language neural solution working on NVIDIA GPUs.
Video(mp4)
Getting Started with TotalView 8.9.2 and CUDA 4.0
Presented  Chris Gottbrath, Principal Product Manager, Rogue Wave
With Totalview 8.9.2 and the CUDA add-on you can debug both the CPU and the GPU code. Set breakpoints, step, and dive in code running on the GPU using all the familiar TotalView GUI methods. TotalView supports many of the advanced features such as UVA and Multi-GPUs.
Video(mp4)
CUDA Optimization : Register Spilling and Local Memory Usage with Live Q&A  by Dr Paulius Micikevicius, NVIDIA
Final installment of our optimization series.
Focusing on one of the more subtle performance impactors -this webinar provides strategies which you will be able to implement immeadaitely to extract extra performance from your application.

Video(mp4)

PDF

Trace Based Performance Analysis for GPUs Using Vampir Trace Collector
Learn how to use Vampir Trace Collector event logging to identify performance bottlenecks. Understand how the graphical analysis of this data provides insight about how the various layers of parallelism interact in order to see what the  is really happening.
Video(mp4)
PGI Accelerator for Fortran - Simplified GPU Programming Using Directives, by Michael Wolfe, Portland Group
This webinar provides an overview and real examples of how to parallelize your application with intuitive compiler directives. A fast and easy complement to programming directly in CUDA Fortran or CUDA C/C++. 
Video(mp4)

Overview and Usage of LibJacket CUDA Library, Presented by Accelereyes
A powerful library with hundreds of fast prepacked convolutions, reductions, matrix indexing, linear algebra, image processing, signal processing, and statistics functions. Handling of N-Dimensional data. Scalable to multiple GPUs. Powerful GFOR loop for running FOR-loops iterations in parallel on the GPU cores and a Graphics library.

Video(mp4)

PDF

CUDA Optimization : Memory Bandwidth Limited Kernels + Live Q&A  by Tim Schroeder
This Webinar focuses on one of the main performance limiters and provides actionable ideas and strategies for performance optimization - don't miss the live Q&A session which will followed.

Video(mp4)

PDF

PGI Accelerator for C -  Simplified GPU Programming Using Directives

Presented by Michael Wolfe, PGI leading compiler expert providing a technical overview and real code examples using this exciting and innovative programming solution.

Video(mp4)
CUDA Optimization : Instruction Limited Kernels with Live Q&A  by Gernot Ziegler
Get field proven advice on methods to improve kernels who's performance is limited by instructions. A not to be missed live webinar.

Video(mp4)

PDF

GPU Direct and Unified Virtual Addressing+ Live Q&A by Tim Schroeder
These new features of CUDA4.0 have enabled huge oppotunities , if you haven't looked at how to leverage them for your application, this would be an excellent way to find out the latest hints and tips.

Video(mp4)
PDF

Multi-GPU and Host Multi-Threading Considerations+ Live Q&A by Dr Paulius Micikevicius
Get critical , field proven tips from one the World's leading CUDA Experts

Video(mp4) 
PDF

CUDA Optimization: Identifying Performance Limiters by Dr Paulius Micikevicius
A great oppotunity to learn how to identify what is limiting the performance of your application. This is first of a series of optimization webinars which will help you extract even more performance from your applications. A must attend for all CUDA Developers.

Video(mp4)

PDF

Introduction to CUDA Libraries+ Live Q&A by Dr Justin Luitjens
Get an insight on how to best leverage the powerful libraries than come as part of the CUDA Toolkit.

Video(mp4) 
PDF

CUDA Texture Memory & Graphics interop+ Live Q&A With Gernot Ziegler Video(mp4)
Special Live Q&A with Numeric Library Team  + Live Q&A
Your opportunity to engage with NVIDIA Engineering - ask questions about CUDA and the CUDA Libraries - fast paced webinars are a must attend for all CUDA Developers
Team Manager: Ujval Kapasi
Video(mp4)
CUDA Warps and Occupancy Considerations+ Live with  Dr Justin Luitjens, NVIDIA

Topics  covered in this talk include:
* Thread Blocks and Warps
* Tesla vs Fermi Warps
* Hiding memory access latency
* Smart use of profiler information
Other considerations which may be higher impact than occupancy

Presented by  Dr Justin Luitjens, GPU Computing Expert and NVIDIA Devtech

Video(mp4)

PDF

Overview of Latest Release of CULA Tools, An Accelerated Linear Alegra Solution for Professionals by John Humphrey, Engineering Director,  EM Photonics Inc

EM Photonics produces the popular CULA library for dense linear algebra  functions using CUDA GPUs. CULA has hundreds of routines for system solutions, eigenvalue problems, and matrix factorizations. Recently, we have introduced an exciting new feature called the Link Interface which is compile-time and link-time compatible with other matrix libraries. In the

link interface, you re-link your application and CULA handles all the details so you can try GPUs with very little effort! This webinar will cover the CULA library and show real-world usage of the link interface, including use in Matlab.

Video(mp4)
CUDA Shared Memory  & Cache + Live Q&A with Dr Steve Rennich, NVIDIA

Some of the topics discussed in this technical webinar include:
Shared memory usage vs L2 Cache
Shared memory banking overview
Minimizing bank conflict and maximizing performance
Hints and Tips for optimal use of Fermi's L2 Cache

Presented by Dr Steven Rennich, GPU Computing Expert and NVIDIA DevTech
 
Video(mp4)
The Practical Reality of Heterogeneous Super Computing

This presentation will cover the typical concerns that developers and owners of HPC solutions have about GPU adoption and how they are being resolved; in particular, the concern that two code bases have to be maintained. The presentation emphasizes that GPU Computing has evolved to a point that it is now possible to write and maintain codes that will work on GPUs and CPUs using CUDA & CUDA x86.

Presentered by Dr Rob Farber, Author and GPU Computing Expert

Video(mp4)

PDF

CUDA  Global Memory Usage & Strategy + Live Q&A with Dr Justin Luitjens, NVIDIA

Smartly using the CUDA memory model is critical for writing high performance code.
This talk will cover Global memory coalescing and strategies data structures and access patterns. Followed by Live Q&A
Presented by  Dr Justin Luitjens, GPU Computing Expert and NVIDIA Devtech

Video (mp4)
PDF  
How to accelerate the incomplete-LU and Cholesky preconditioned iterative methods on a GPU using CUSPARSE and CUBLAS libraries Video (mp4)
WhitePaper
Introduction to GPU Computing & CUDA By Sarah Tariq, NVIDIA Video (mp4)
PDF
Getting Started with CUDA & GPU Computing  + Live Q&A
A detailed review of how to get up and running using CUDA C/C++ by Sarah Tariq
Video (mp4)
Floating Point Capabilities and Accuracy of Latest NVIDIA GPUs

Video (mp4)
Slides(PDF)
WhitePaper

Introduction to MainConcept's CUDA H.264/AVC Encoder
Learn how to take advantage of NVIDIA GPUs with the new MainConcept CUDA H.264 Video Encoder, which delivers as much as 700% faster video encoding performance improvements over CPU. Presented by MainConcept, an industry-leading supplier of video and audio codec solutions for the Multimedia, Consumer Electronics, Broadcast & Professional video production, Digital Signage, Medical, Security.
Video (.mp4)
Slides (PDF)
Monitoring and Managing GPU Clusters with Bright Cluster Management
Presented by CEO and Founder of Bright Computing, Dr. Matthijs van Leeuwen. This technical presentation about Bright Cluster Manager a complete cluster management software solution that offers comprehensive functionality for monitoring and managing GPUs
Video (.mp4)
GPU Computing using CUDA C – An Introduction (2010)
An introduction to the basics of GPU computing using CUDA C. Concepts will be illustrated with walkthroughs of code samples. No prior GPU Computing experience required
Video (mp4 )
GPU Computing using CUDA C – Advanced 1 (2010)
First level optimization techniques such as global memory optimization, and processor utilization. Concepts will be illustrated using real code examples
Video (wmv )
Slides (pdf)
GPU Computing using CUDA C - Advanced 2 (2010)
Advanced topics such as execution configuration, instruction and warp optimization with a focus on real applications
Video (wmv 
Slides (pdf)
GPU Computing using OpenCL- An Introduction (2010)
An introduction to the OpenCL API leveraging NVIDIA's CUDA parallel computing architecture. Topics covered include comparison between CUDA C and OpenCL API, memory and threading models. No prior GPU Computing experience is required.
Video (wmv )
Slides (pdf)
GPU Computing using OpenCL Advanced 1 (2010)
NVIDIA presents tricks and tips on how to write great OpenCL code. Topics covered include memory usage best practises, achieving best processor occupancy and instruction throughput.
Video (wmv)
Slides (pdf)
Parallel Nsight - An Introduction and Overview (2010)
This Webinar will provide an overview of the powerful features of Parallel Nsight - NVIDIA's latest tool to help develop, debug and analyse parallel code on CPU's and GPUs
Video (mp4 format)
Thrust, A C++ Standard Template Library for CUDA C - An Introduction (2010)
Thrust is a powerful C++ standard template library providing highly optimized highlevel CUDA C kernels which can be be a fast track for adding GPU acceleration to existing applications. Thrust is one of NVIDIA many open source projects. This Webinar provides an overview and example uses
Video (mp4 format)
Slides (pdf format)
PGI Fortran - An Overview Video (mp4 )
DirectCompute for GPU Computing - An Introduction by Microsoft Video (.mp4 )
An Introduction to the MAGMA project - acceleration of dense linear algebra by Prof. Jack Dongarra Video (mp4 )
An Introduction to CULA GPU Accelerated Linear Algebra Video (mp4 )
Rapid Application Development platform for GPGPUs – Jacket with MATLAB® Video (mp4 )
CUDA 4.0 Overview Video (.mp4)
Live Q&A with Ian Buck Audio Only (.mp3)
CUDA memcheck  Overview and Demo Video (.mp4)
CUDAgdb Overview and Demo Video (.mp4)
CUDA Numeric Libraries 3.2  Performance Overview Video (.mp4)
CUDA 3.2 Introduction & Overview Video (.mp4)
Introduction to GPU.NET Video (.mp4)

OpenCL is trademark of Apple Inc. used under license to the Khronos Group Inc.