What’s New in CUDA

CUDA 9

CUDA 9 is the most powerful software platform for GPU-accelerated applications. It has been built for Volta GPUs and includes faster GPU-accelerated libraries, a new programming model for flexible thread management, and improvements to the compiler and developer tools. With CUDA 9 you can speed up your applications while making them more scalable and robust.

Download CUDA 9 today to get started.

Release Highlights

2X - 5X

UP TO 5X FASTER LIBRARIES WITH OPTIMIZATIONS AND HEURISTICS

POWERFUL THREAD MANAGEMENT WITH COOPERATIVE GROUPS

UP TO 1.5X FASTER HPC APPS WITH VOLTA GPUs, NVLink AND HBM2


Key Features

Libraries
  • Speed up high performance computing (HPC) and deep learning apps with new GEMM kernels in cuBLAS
  • Execute image and signal processing apps faster with performance optimizations across multiple GPU configurations in cuFFT and NVIDIA Performance Primitives
  • Solve linear and graph analytics problems common in HPC with new algorithms in cuSOLVER and nvGRAPH
Cooperative Groups
  • Express rich parallel algorithms with threads from sub-tiles to warps, blocks and grids
  • Manage and reuse threads efficiently within an application with new API and function primitives
  • Replace warp-synchronous programming with robust programming model on Kepler architecture and above
Volta Architecture
  • Execute AI applications faster with Tensor Cores performing 5X faster than Pascal GPUs
  • Scale multi-GPU applications with next generation NVLink delivering 2X throughput of prior generation
  • Increase GPU utilization with Volta Multi-Process Service (MPS)
Development Tools
  • Optimize and pre-fetch memory access by identifying source code causing page faults in unified memory
  • Profile NVLink efficiently by adding events to timeline and color coding connections
  • Inspect unified memory performance bottlenecks with new event filters based on virtual address, migration reason and page fault access type
See Release Notes for details.

CUDA 9 Features Revealed

Learn about new features in CUDA 9 including updates to the programming model, computing libraries and development tools.

Inside Volta

Learn about new technologies and features introduced in the NVIDIA Volta GPU architecture.

Cooperative Groups

Learn about the new CUDA parallel programming model for managing threads in scalable applications.

Optimizing Performance With CUDA 9

Learn about new profiling capabilities in CUDA 9 for Volta GPUs and technologies such as Unified Memory and NVLink.


Archived Releases

> CUDA 8.0 - 28 Sep, 2016

Pascal Architecture Support

  • Enhance performance out-of-the-box on Pascal GPUs
  • Simplify programming using Unified Memory including support for large datasets, concurrent data access and atomics
  • Optimize Unified Memory performance using new data migration APIs
  • Increase throughput at ultra-fast speeds using NVIDIA® NVLINK™, new high-speed interconnect

Development Tools

  • Identify latent system-level bottlenecks using critical path analysis
  • Improve productivity by up to 2x with faster NVCC compile times
  • Tune OpenACC applications and overall host code using new profiling extensions

Libraries

  • Accelerate graph analytics algorithms with nvGRAPH
  • Speed-up Deep Learning applications using native support for FP16 and INT8, support for batch operation in cuBLAS

See Release Notes for details.

Latest News

Deep Learning Helps Reconstruct and Improve Optical Microscopy

Researchers from UCLA developed a deep learning approach that could quickly produce more accurate images to aid diagnostic medicine.

Maximizing Unified Memory Performance in CUDA

Many of today’s applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the most of GPU performance requires the data to be as close to the GPU as possible.

Generating Photorealistic Images of Fake Celebrities with Artificial Intelligence

Researchers from NVIDIA recently published a paper detailing their new methodology for generative adversarial networks (GANs) that generated photorealistic pictures of fake celebrities.

High-Performance GPU Computing in the Julia Programming Language

Julia is a high-level programming language for mathematical computing that is as easy to use as Python, but as fast as C.

Blogs: Parallel ForAll

Malware Detection in Executables Using Neural Networks

The detection of malicious software (malware) is an increasingly important cyber security problem for all of society. Single incidences of malware can cause millions of dollars in damage.

Maximizing Unified Memory Performance in CUDA

Many of today’s applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the most of GPU performance requires the data to be as close to the GPU as possible.

Pro Tip: Pinpointing Runtime Errors in CUDA Fortran

We’ve all been there. Your CUDA Fortran code is humming along and suddenly you get a runtime error: copyin, copyout, usually accompanied by FAILED in all caps.

High-Performance GPU Computing in the Julia Programming Language

Julia is a high-level programming language for mathematical computing that is as easy to use as Python, but as fast as C.