What’s New in CUDA

CUDA 9.2

CUDA 9.2 includes updates to libraries, a new library for accelerating custom linear-algebra algorithms, and lower kernel launch latency.

With CUDA 9.2, you can:

  • Speed up recurrent and convolutional neural networks through cuBLAS optimizations
  • Speed up FFT of prime size matrices through Bluestein kernels in cuFFT
  • Accelerate custom linear algebra algorithms with CUTLASS 1.0
  • Launch CUDA kernels up to 2X faster than CUDA 9 with new optimizations to the CUDA runtime

Additionally, CUDA 9.2 includes bug fixes and supports new operating systems and popular development tools. CUDA 9.2 is freely available for download today!

Download Now

“Red Hat works closely with NVIDIA to help bring the full power of NVIDIA CUDA to our users. Collaborating with NVIDIA, we’ve paired the new features and performance improvements of CUDA 9.2 with new Red Hat Enterprise Linux versions, giving our expanding community of CUDA developers an easier-to-install, more tightly integrated software stack that helps deliver greater application performance for demanding AI and HPC workloads.”

Chris Wright, Vice President and Chief Technology Officer at Red Hat, Inc. Red Hat logo

CUDA - New Features and Beyond

Learn about new features in CUDA including updates to the programming model, computing libraries and development tools.

CUTLASS: CUDA Primitives for Dense Linear Algebra

Learn how to implement high-performance matrix-multiplication (GEMM) using open-source C++ template abstractions.

Multi-GPU Programming Techniques in CUDA

Learn techniques and pitfalls of direct multi-GPU programming in CUDA and a novel method using NVLink to scale programs with minimal effort.

Everything You Need to Know About Unified Memory

Learn fundamental principles, important use cases, performance considerations and optimization ideas using Unified Memory.


CUDA 9 is the most powerful software platform for GPU-accelerated applications. It has been built for Volta GPUs and includes faster GPU-accelerated libraries, a new programming model for flexible thread management, and improvements to the compiler and developer tools. With CUDA 9 you can speed up your applications while making them more scalable and robust.

Release Highlights

2X - 5X




Key Features

  • Speed up high performance computing (HPC) and deep learning apps with new GEMM kernels in cuBLAS
  • Execute image and signal processing apps faster with performance optimizations across multiple GPU configurations in cuFFT and NVIDIA Performance Primitives
  • Solve linear and graph analytics problems common in HPC with new algorithms in cuSOLVER and nvGRAPH
Cooperative Groups
  • Express rich parallel algorithms with threads from sub-tiles to warps, blocks and grids
  • Manage and reuse threads efficiently within an application with new API and function primitives
  • Replace warp-synchronous programming with robust programming model on Kepler architecture and above
Volta Architecture
  • Execute AI applications faster with Tensor Cores performing 5X faster than Pascal GPUs
  • Scale multi-GPU applications with next generation NVLink delivering 2X throughput of prior generation
  • Increase GPU utilization with Volta Multi-Process Service (MPS)
Development Tools
  • Optimize and pre-fetch memory access by identifying source code causing page faults in unified memory
  • Profile NVLink efficiently by adding events to timeline and color coding connections
  • Inspect unified memory performance bottlenecks with new event filters based on virtual address, migration reason and page fault access type
See Release Notes for details.

CUDA 9 Features Revealed

Learn about new features in CUDA 9 including updates to the programming model, computing libraries and development tools.

Inside Volta

Learn about new technologies and features introduced in the NVIDIA Volta GPU architecture.

Cooperative Groups

Learn about the new CUDA parallel programming model for managing threads in scalable applications.

Optimizing Performance With CUDA 9

Learn about new profiling capabilities in CUDA 9 for Volta GPUs and technologies such as Unified Memory and NVLink.

Archived Releases

Pascal Architecture Support

  • Enhance performance out-of-the-box on Pascal GPUs
  • Simplify programming using Unified Memory including support for large datasets, concurrent data access and atomics
  • Optimize Unified Memory performance using new data migration APIs
  • Increase throughput at ultra-fast speeds using NVIDIA® NVLINK™, new high-speed interconnect

Development Tools

  • Identify latent system-level bottlenecks using critical path analysis
  • Improve productivity by up to 2x with faster NVCC compile times
  • Tune OpenACC applications and overall host code using new profiling extensions


  • Accelerate graph analytics algorithms with nvGRAPH
  • Speed-up Deep Learning applications using native support for FP16 and INT8, support for batch operation in cuBLAS

See Release Notes for details.

Latest News

Download DeepStream SDK 2.0 Today to Develop Scalable Video Analytics Applications

NVIDIA has released the DeepStream Software Development Kit (SDK) 2.0 for Tesla GPUs, which is a key part of the NVIDIA Metropolis platform.

A Trio of New Nsight Tools That Empower Developers to Fully Optimize their CPU and GPU Performance

Three big NVIDIA Nsight releases on the same day! NSight Systems is a brand new optimization tool; Nsight Visual Studio Edition 5.6 extends support to Volta GPUs and Win10 RS4; and NSight GRAPHICS 1.2 replaces the current Linux Graphics Debugger.

CUDA 9.2 Now Available

CUDA 9.2 includes updates to libraries, a new library for accelerating custom linear-algebra algorithms, and lower kernel launch latency.

Drink up! Beer Tasting Robot Uses AI to Assess Quality

Can a beer tasting robot do a better job than humans in judging a beer? Researchers in Australia developed a robot that uses machine learning to assess the quality of the beer.

Blogs: Parallel ForAll

Neural Machine Translation Inference with TensorRT 4

Neural machine translation exists across a wide variety consumer applications, including web sites, road signs, generating subtitles in foreign languages, and more.

Using OpenACC to Port Solar Storm Modeling Code to GPUs

Solar storms consist of massive explosions on the Sun that can release the energy of over 2 billion megatons of TNT in the form of solar flares and Coronal Mass Ejections (CMEs).

Hacking Ansel to Slash VR Rendering Times

Warrior9 VR team members started working on The PhoenIX – a sci-fi animated series in virtual reality (VR) — two years ago.

Coffee Break Series: NVIDIA Ansel

Coffee Break: NVIDIA Ansel NVIDIA Ansel is a powerful in-game camera that lets players take professional-grade photographs in their games.