What’s New in CUDA

CUDA 10

CUDA 10 is the most powerful software development platform for building GPU-accelerated applications. It has been built for Turing GPUs and includes performance optimized libraries, a new asynchronous task-graph programming model, enhanced CUDA & graphics API interoperability, and new developer tools. CUDA 10 also provides all the components needed to build applications for NVIDIA's most powerful server platforms for AI and high performance computing (HPC) workloads, both on-prem (DGX-2) and in the cloud (HGX-2).

Download Now

See Release Notes for additional details.


Key Features

TURING AND NEW SYSTEMS

  • New GPU architecture: Build and optimize applications for the next generation of Turing GPUs
  • Tensor Cores
  • NVSwitch Fabric

CUDA PLATFORM

  • CUDA Graphs: A new asynchronous task-graph programming model which enables more efficient kernel launch and execution
  • CUDA/Graphics Interop: New interoperability between CUDA and graphics APIs, including Vulkan and DX12
  • Warp Matrix

LIBRARIES

  • nvJPEG: New library for hybrid JPEG processing that provides >2x speedup on single and batched image decoding
  • Performance Optimized Libraries: Strong FFT performance scaling across 16-GPU systems, acceleration of dense linear algebra routines such as Eigensolver and Cholesky factorization, and Turing optimized mixed-precision GEMM performance

DEVELOPER TOOLS

  • New Developer Tools: New Nsight product family of tools for tracing, profiling, and debugging of CUDA applications (Nsight Systems and Nsight Compute)

Release Highlights

cuFFT 10.0 - Upto 17TF performance on 16-GPUs 3D 1K FFT

cuBLAS 10.0 - Upto 90TF of GEMM Performance

cuSOLVER 10.0 - Upto 4x faster on symmetric eigensolver


Learn More

Inside Turing

Learn about new technologies and features introduced in the NVIDIA Turing GPU architecture.

Nsight Systems

Learn more about the performance analysis tool designed to provide software optimization insights

Nsight Compute

Learn more about the interactive CUDA API debugging and kernel profiling tool


Archived Releases

Volta Architecture Support

  • Execute AI applications faster with Tensor Cores performing 5X faster than Pascal GPUs
  • Scale multi-GPU applications with next generation NVLink delivering 2X throughput of prior generation
  • Increase GPU utilization with Volta Multi-Process Service (MPS)

Development Tools

  • Optimize and pre-fetch memory access by identifying source code causing page faults in unified memory
  • Profile NVLink efficiently by adding events to timeline and color coding connections
  • Inspect unified memory performance bottlenecks with new event filters based on virtual address, migration reason and page fault access type

Libraries

  • Speed up high performance computing (HPC) and deep learning apps with new GEMM kernels in cuBLAS
  • Execute image and signal processing apps faster with performance optimizations across multiple GPU configurations in cuFFT and NVIDIA Performance Primitives
  • Solve linear and graph analytics problems common in HPC with new algorithms in cuSOLVER and nvGRAPH

Cooperative Groups

  • Express rich parallel algorithms with threads from sub-tiles to warps, blocks and grids
  • Manage and reuse threads efficiently within an application with new API and function primitives
  • Replace warp-synchronous programming with robust programming model on Kepler architecture and above

See Release Notes archive for details.

Latest News

NVIDIA Turing SDKs Now Available

NVIDIA’s Turing architecture is one of the biggest leaps in computer graphics in 20 years. Here’s a look at the latest developer software releases to take advantage of this cutting-edge GPU.

TensorRT 5 RC Now Available

AT GTC Japan, NVIDIA announced the latest version of the TensorRT’s high-performance deep learning inference optimizer and runtime.

Major Companies in Japan Select Jetson AGX Xavier

At GTC Japan in Tokyo,  NVIDIA Founder and CEO Jensen Huang announced that leading Japanese companies FANUC, Komatsu, Musashi Seimitsu, and Kawada Technologies will adopt Jetson AGX Xavier in their next generation autonomous machines.

Yamaha Motor Selects NVIDIA Jetson AGX Xavier

Yamaha Motor just announced they selected the NVIDIA Jetson AGX Xavier platform as the development system to power their upcoming lineup of autonomous machines in agriculture, logistics, marine products, and last mile-transportation.

Blogs: Parallel ForAll

Turing Multi-View Rendering in VRWorks

Virtual reality displays continue to evolve and now include advanced configurations such as canted HMDs with non-coplanar displays. Other headsets offer ultra-wide fields-of-view as well as other novel configurations.

Turing Variable Rate Shading in VRWorks

NVIDIA Turing GPUs enable a new, easily implemented rendering technique, Variable Rate Shading (VRS). VRS increases rendering performance and quality by applying varying amount of processing power to different areas of the image.

Video Series: Shiny Pixels and Beyond: Real-Time Raytracing at SEED

Video Series: Shiny Pixels and Beyond –  Real-Time Raytracing at SEED SEED, Electronic Art’s “Search for Extraordinary Experiences Divison”, walks through what they’ve learned about real-time ray tracing when they built the impressive “PICA PICA”

Introduction to Turing Mesh Shaders

The Turing architecture introduces a new programmable geometric shading pipeline through the use of mesh shaders.