What’s New in CUDA

Announcing CUDA 9

CUDA 9 is the fastest software platform for GPU-accelerated applications. It has been built for Volta GPUs and provides faster GPU-accelerated libraries, improvements to the programming model, compiler and developer tools. With CUDA 9 you can speed up your applications while making them more scalable and robust.

To be notified when CUDA 9 is available for download, join the NVIDIA Developer Program.

CUDA 9 Features Revealed

Learn about new features in CUDA 9 including updates to the programming model, computing libraries and development tools.

Inside Volta

Learn about new technologies and features introduced in the NVIDIA Volta GPU architecture.

Cooperative Groups

Learn about the new CUDA parallel programming model for managing threads in scalable applications.

Optimizing Application Performance With CUDA 9

Learn about new profiling capabilities in CUDA 9 for Volta GPUs and technologies such as Unified Memory and NVLink.


CUDA 8 presents major improvements to the memory model, profiling tools, and new libraries. Using CUDA 8, you can improve performance, simplify memory usage, profile and debug your application more efficiently.

What’s New in CUDA 8 Webinar

Release Highlights

Perform 2X faster out of the box with Pascal GPUs

Solve larger problems with Unified Memory

Increase application throughput with FP16 and INT8 support

New GPU-accelerated NVGraph library for Graph Analytics

Key Features

Pascal Architecture Support
  • Enhance performance out-of-the-box on Pascal GPUs
  • Simplify programming using Unified Memory including support for large datasets, concurrent data access and atomics
  • Optimize Unified Memory performance using new data migration APIs
  • Increase throughput at ultra-fast speeds using NVIDIA® NVLINK™, new high-speed interconnect
Developer Tools
  • Identify latent system-level bottlenecks using critical path analysis
  • Improve productivity by up to 2x with faster NVCC compile times
  • Tune OpenACC applications and overall host code using new profiling extensions
  • Accelerate graph analytics algorithms with nvGRAPH
  • Speed-up Deep Learning applications using native support for FP16 and INT8, support for batch operation in cuBLAS

Customer Quotes

"That's a great work, guys! I like CUDA Toolkit more and more. Hope the bugs I submitted will be fixed by release. I used EA for checking performance of my applications and find the way to optimize them. I found, that having OpenACC marks is very useful. I tested both remote profiling and local one. I helped me. Other elements and counters helped me to find some other rooms to speedup my application. Thanks!"

Alexey Romanenko – Novosibirsk State University

"Shows lots of promise, looks like it is going to be a great library, few more tools and it will be great :-) Also more examples and bit more documentation. Looking forward keep using it"

Vicente Cuellar – Wave Crafters

Additional Resources


CUDA 8 and Beyond

Learn about new features in CUDA 8, NVIDIA’s vision for CUDA and challenges facing the future of parallel software development.

CUDA 8 Performance Overview

Learn how updates to the CUDA Toolkit improve the performance of GPU-accelerated applications.

Developer Tools in CUDA 8

Learn about new profiling capabilities in CUDA 8.

Debugging Tools in CUDA 8

Learn about the the latest updates to debugging tools in CUDA 8.

Latests News

PGI Community Edition 17.4 Now Available

PGI compilers and tools are used by scientists and engineers who develop applications for high-performance computing (HPC) systems.

Facebook Training AI Bots to Negotiate with Humans

Researchers at Facebook Artificial Intelligence Research (FAIR) published a paper introducing AI-based dialog agents that can negotiate and compromise.

Predicting Aggressive Prostate Cancer with AI

University of Alberta scientists developed a deep learning-based prostate cancer diagnostic platform that only uses a single drop of blood which will allow men to bypass the current painful biopsy methods.

AI Turns UI Designs Into Code

Copenhagen-based startup UIzard Technologies trained a neural network to automatically generate code from a graphical user interface screenshot.

Blog: Paralell 4 All

Unified Memory for CUDA Beginners

My previous introductory post, “An Even Easier Introduction to CUDA C++“, introduced the basics of CUDA programming by showing how to write a simple program that allocated two arrays of numbers in memory accessible to the GPU and then added them t

GOAI: Open GPU-Accelerated Data Analytics

Recently, Continuum Analytics, H2O.ai, and MapD announced the formation of the GPU Open Analytics Initiative (GOAI).

Explaining How End-to-End Deep Learning Steers a Self-Driving Car

As part of a complete software stack for autonomous driving, NVIDIA has created a deep-learning-based system, known as PilotNet, which learns to emulate the behavior of human drivers and can be deployed as a self-driving car controller.

CUDA 9 Features Revealed: Volta, Cooperative Groups and More

Today at the GPU Technology Conference NVIDIA announced CUDA 9, the latest version of CUDA’s powerful parallel computing platform and programming model. In this post I’ll provide an overview of the awesome new features of CUDA 9.