What’s New in CUDA

CUDA 10.1 Update 1

CUDA 10.1 Update 1 incorporates new functionality and performance updates to CUDA-X libraries including cuSPARSE, cuBLAS, cuFFT, and nvJPEG.

With CUDA 10.1 Update 1, you get:

  • Improved SpMM/SpMV kernel performance in cuSPARSE for sparse applications in HPC and machine learning
  • Extended data type support and improved heuristics in cuBLAS for all HPC and machine learning applications
  • Graph API support in cuFFT to allow use of FFT kernels in CUDA Graphs
  • APIs for stream parsing, memory control, decoding, and multi-channel bitstreams in nvJPEG

Additionally, CUDA 10.1 Update 1 includes updates to the Nsight Systems and Nsight Compute developer tools. CUDA 10.1 Update 1 is available for download today!

Download Now

See Release Notes for additional details.

CUDA 10.1

CUDA 10.1 includes a new lightweight GEMM library, new functionality and performance updates to existing libraries, and improvements to the CUDA Graphs APIs.

With CUDA 10.1, you get:

  • cuBLASLt, a new lightweight GEMM library with a flexible API and tensor core support for INT8 inputs and FP16 CGEMM split-complex matrix multiplication
  • New selective eigensolvers SYEVDX and SYGVDX in cuSOLVER, and performance improvements of up to 1.5X for full spectrum eigensolvers
  • New encoding and batched decoding functionalities in nvJPEG
  • Up to 4X faster performance for broad set of random number generators in cuRAND
  • Improved performance and support for fork/join kernels in CUDA Graphs APIs

Additionally, CUDA 10.1 includes bug fixes, support for new operating systems, and updates to the Nsight Systems and Nsight Compute developer tools.

“Since CUDA 9.0, Red Hat has collaborated closely with NVIDIA to bring the power of the CUDA development platform across Red Hat’s portfolio of open hybrid cloud technologies. With the release of CUDA 10.1, we’re pleased to continue this collaboration as we work with NVIDIA on bringing additional ease-of-use, scale and choice to users building AI and HPC applications on enterprise-grade, open foundations, including Red Hat Enterprise Linux and Red Hat OpenShift Container Platform”

Chris Wright, Vice President and Chief Technology Officer at Red Hat, Inc. Red Hat logo


CUDA 10 is the most powerful software development platform for building GPU-accelerated applications. It has been built for Turing GPUs and includes performance optimized libraries, a new asynchronous task-graph programming model, enhanced CUDA & graphics API interoperability, and new developer tools. CUDA 10 also provides all the components needed to build applications for NVIDIA's most powerful server platforms for AI and high performance computing (HPC) workloads, both on-prem (DGX-2) and in the cloud (HGX-2).

Key Features


  • New GPU architecture: Build and optimize applications for the next generation of Turing GPUs
  • Tensor Cores
  • NVSwitch Fabric


  • CUDA Graphs: A new asynchronous task-graph programming model which enables more efficient kernel launch and execution
  • CUDA/Graphics Interop: New interoperability between CUDA and graphics APIs, including Vulkan and DX12
  • Warp Matrix


  • nvJPEG: New library for hybrid JPEG processing that provides >2x speedup on single and batched image decoding
  • Performance Optimized Libraries: Strong FFT performance scaling across 16-GPU systems, acceleration of dense linear algebra routines such as Eigensolver and Cholesky factorization, and Turing optimized mixed-precision GEMM performance


  • New Developer Tools: New Nsight product family of tools for tracing, profiling, and debugging of CUDA applications (Nsight Systems and Nsight Compute)

Release Highlights

cuFFT 10.0 - Upto 17TF performance on 16-GPUs 3D 1K FFT

cuBLAS 10.0 - Upto 90TF of GEMM performance

cuSOLVER 10.0 - Upto 4x faster on symmetric eigensolver

All library benchmarks use NVIDIA Tesla V100 (or where specified P100) GPUs and Intel Skylake 6140 Gold 2.3 GHz processors

Learn More

CUDA 10 Features Revealed

Learn about new features in CUDA 10 including updates to the programming model, computing libraries, and development tools.

Inside Turing

Learn about new technologies and features introduced in the NVIDIA Turing GPU architecture.

Nsight Systems

Learn more about the performance analysis tool designed to provide software optimization insights

Nsight Compute

Learn more about the interactive CUDA API debugging and kernel profiling tool

Archived Releases

Volta Architecture Support

  • Execute AI applications faster with Tensor Cores performing 5X faster than Pascal GPUs
  • Scale multi-GPU applications with next generation NVLink delivering 2X throughput of prior generation
  • Increase GPU utilization with Volta Multi-Process Service (MPS)

Development Tools

  • Optimize and pre-fetch memory access by identifying source code causing page faults in unified memory
  • Profile NVLink efficiently by adding events to timeline and color coding connections
  • Inspect unified memory performance bottlenecks with new event filters based on virtual address, migration reason and page fault access type


  • Speed up high performance computing (HPC) and deep learning apps with new GEMM kernels in cuBLAS
  • Execute image and signal processing apps faster with performance optimizations across multiple GPU configurations in cuFFT and NVIDIA Performance Primitives
  • Solve linear and graph analytics problems common in HPC with new algorithms in cuSOLVER and nvGRAPH

Cooperative Groups

  • Express rich parallel algorithms with threads from sub-tiles to warps, blocks and grids
  • Manage and reuse threads efficiently within an application with new API and function primitives
  • Replace warp-synchronous programming with robust programming model on Kepler architecture and above

See Release Notes archive for details.

Latest News

Generating Character Animations from Speech with AI

Researchers from the Max Planck Institute for Intelligent Systems, a member of NVIDIA’s NVAIL program, developed an end-to-end deep learning algorithm that can take any speech signal as input - and realistically animate it in a wide range of adult


From fluid dynamics and weather simulation, to computational chemistry and bioinformatics, HPC applications span across many domains.

NVIDIA and Red Hat: Simplifying NVIDIA GPU Driver Deployment on Red Hat Enterprise Linux

Based on feedback from our users, NVIDIA and Red Hat have worked closely to improve the user experience when installing and updating NVIDIA software on RHEL, including GPU drivers and CUDA

Developer Spotlight: Enabling the SKA Radio Telescope to Explore the Universe

The Square Kilometre Array (SKA) project is an effort to build the world’s largest radio telescope, with a collecting area of over one square kilometre.

Blogs: Parallel ForAll

The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload

Figuring out how to reduce the GPU frame time of a rendering application on PC is challenging for even the most experienced PC game developers.

Object Detection and Lane Segmentation Using Multiple Accelerators with DRIVE AGX

Autonomous vehicles require fast and accurate perception of the surrounding environment in order to accomplish a wide set of tasks concurrently in real time.

Creating an Object Detection Pipeline for GPUs

Earlier this year in March, we showed retinanet-examples, an open source example of how to accelerate the training and deployment of an object detection pipeline for GPUs.

Combating Adversarial Attacks with a Barrage of Random Transforms (BaRT)

Wherever you look these days, you can find AI affecting your life in one way or another.