What’s New in CUDA

CUDA 10

CUDA 10 is the most powerful software development platform for building GPU-accelerated applications. It has been built for Turing GPUs and includes performance optimized libraries, a new asynchronous task-graph programming model, enhanced CUDA & graphics API interoperability, and new developer tools. CUDA 10 also provides all the components needed to build applications for NVIDIA's most powerful server platforms for AI and high performance computing (HPC) workloads, both on-prem (DGX-2) and in the cloud (HGX-2).

Download Now

See Release Notes for additional details.


Key Features

TURING AND NEW SYSTEMS

  • New GPU architecture: Build and optimize applications for the next generation of Turing GPUs
  • Tensor Cores
  • NVSwitch Fabric

CUDA PLATFORM

  • CUDA Graphs: A new asynchronous task-graph programming model which enables more efficient kernel launch and execution
  • CUDA/Graphics Interop: New interoperability between CUDA and graphics APIs, including Vulkan and DX12
  • Warp Matrix

LIBRARIES

  • nvJPEG: New library for hybrid JPEG processing that provides >2x speedup on single and batched image decoding
  • Performance Optimized Libraries: Strong FFT performance scaling across 16-GPU systems, acceleration of dense linear algebra routines such as Eigensolver and Cholesky factorization, and Turing optimized mixed-precision GEMM performance

DEVELOPER TOOLS

  • New Developer Tools: New Nsight product family of tools for tracing, profiling, and debugging of CUDA applications (Nsight Systems and Nsight Compute)

Release Highlights

cuFFT 10.0 - Upto 17TF performance on 16-GPUs 3D 1K FFT

cuBLAS 10.0 - Upto 90TF of GEMM performance

cuSOLVER 10.0 - Upto 4x faster on symmetric eigensolver

All library benchmarks use NVIDIA Tesla V100 (or where specified P100) GPUs and Intel Skylake 6140 Gold 2.3 GHz processors


Learn More

CUDA 10 Features Revealed

Learn about new features in CUDA 10 including updates to the programming model, computing libraries, and development tools.

Inside Turing

Learn about new technologies and features introduced in the NVIDIA Turing GPU architecture.

Nsight Systems

Learn more about the performance analysis tool designed to provide software optimization insights

Nsight Compute

Learn more about the interactive CUDA API debugging and kernel profiling tool


Archived Releases

Volta Architecture Support

  • Execute AI applications faster with Tensor Cores performing 5X faster than Pascal GPUs
  • Scale multi-GPU applications with next generation NVLink delivering 2X throughput of prior generation
  • Increase GPU utilization with Volta Multi-Process Service (MPS)

Development Tools

  • Optimize and pre-fetch memory access by identifying source code causing page faults in unified memory
  • Profile NVLink efficiently by adding events to timeline and color coding connections
  • Inspect unified memory performance bottlenecks with new event filters based on virtual address, migration reason and page fault access type

Libraries

  • Speed up high performance computing (HPC) and deep learning apps with new GEMM kernels in cuBLAS
  • Execute image and signal processing apps faster with performance optimizations across multiple GPU configurations in cuFFT and NVIDIA Performance Primitives
  • Solve linear and graph analytics problems common in HPC with new algorithms in cuSOLVER and nvGRAPH

Cooperative Groups

  • Express rich parallel algorithms with threads from sub-tiles to warps, blocks and grids
  • Manage and reuse threads efficiently within an application with new API and function primitives
  • Replace warp-synchronous programming with robust programming model on Kepler architecture and above

See Release Notes archive for details.

Latest News

SONY Breaks ResNet-50 Training Record with NVIDIA V100 Tensor Core GPUs

Researchers from SONY today announced a new speed record for training ImageNet/ResNet 50 in only 224 seconds (three minutes and 44 seconds) with 75 percent accuracy using 2,100 NVIDIA Tesla V100 Tensor Core GPUs.

AI Research Detects Glaucoma with 94 Percent Accuracy

Glaucoma affects more than 2.7 million people in the U.S. and is one of the leading causes of blindness in the world.

AI Study Predicts Alzheimer’s Six Years Before Diagnosis

A new study published in Radiology describes how deep learning can improve the ability of brain imaging to predict Alzheimer’s disease years before an actual diagnosis.

Visualizing Star Polymers in Record Time

In the last five minutes, you have probably come into contact with more polymers than you can count. In fact, they are everywhere; in grocery bags,  water bottles, phones, computers, food packaging, auto parts, tires, airplanes, and toys.

Blogs: Parallel ForAll

Accelerating Intelligent Video Analytics with Transfer Learning Toolkit

Over the past several years, NVIDIA has been developing solutions to make AI and its benefits accessible to every industry.

Multi-Camera Large-Scale Intelligent Video Analytics with DeepStream SDK

The advent of the Internet of things (IoT) and smart cities has seen billions of video sensors deployed worldwide, generating massive amounts of data.

Breaking the Boundaries of Intelligent Video Analytics with DeepStream SDK 3.0

A picture is worth a thousand words and videos have thousands of pictures. Both contain incredible amounts of insights only revealed through the power of intelligent video analytics (IVA).

Using Calibration to Translate Video Data to the Real World

DeepStream SDK 3.0 is about seeing beyond pixels. DeepStream exists to make it easier for you to go from raw video data to metadata that can be analyzed for actionable insights.