The NVIDIA® CUDA® Toolkit provides a comprehensive development environment for C and C++ developers building GPU-accelerated applications. The CUDA Toolkit includes a compiler for NVIDIA GPUs, math libraries, and tools for debugging and optimizing the performance of your applications. You’ll also find programming guides, user manuals, API reference, and other documentation to help you get started quickly accelerating your application with GPUs.

Dramatically simplify parallel programming

Unified Memory
  • Simplifies programming by enabling applications to access CPU and GPU memory without the need to manually copy data. Read more about unified memory.
Drop-in Libraries
  • Automatically accelerate applications’ BLAS and FFTW calculations by up to 8X by simply replacing the existing CPU libraries with the GPU-accelerated equivalents.
Multi-GPU scaling
  • cublasXT - a new BLAS GPU library that automatically scales performance across up to eight GPUs in a single node, delivering over nine teraflops of double precision performance per node, and supporting larger workloads than ever before (up to 512GB). The re-designed FFT GPU library scales up to 2 GPUs in a single node, allowing larger transform sizes and higher throughput.
All developers can now download the CUDA 6  today.

Easier development and new capabilities

  • 64-bit ARM-based systems
  • Microsoft Visual Studio 2013 (VC12)
  • Using cuFFT callbacks for higher performance custom processing on input or output data
  • Improved debugging for CUDA FORTRAN applications
  • BSR sparse matrix format in cuSPARSE routines
  • Application Replay mode in both the Visual Profiler and command line nvprof tool
  • Updated CUDA Occupancy Calculator API provides optimal kernel launch configurations
  • New “nvprune” utility to remove portions of object files for specified GPU architectures
Registered developers can now download CUDA 6.5 RC now. Become a CUDA Registered Developer today.
Members of the CUDA Registered Developer Program get notified of the latest developments, get access to pre-release software and can report issues and file bugs
Learn More

Learn more about the GPU-accelerated libraries and development tools included in the CUDA Toolkit

GPU-Accelerated Libraries
  • cuFFT – Fast Fourier Transforms Library
  • cuBLAS – Complete BLAS library
  • cuSPARSE – Sparse Matrix library
  • cuRAND – Random Number Generator
  • NPP – Thousands of Performance Primitives for Image & Video Processing
  • Thrust – Templated Parallel Algorithms & Data Structures
  • CUDA Math Library of high performance math routines
Development Tools

In addition to all the tools, libraries and documentation in the CUDA Toolkit, you’ll find hundreds of source code samples in the NVIDIA GPU Computing SDK.

If you develop applications in languages other than C or C++, please review the Getting Started Page for a language solution that meets your needs.  The CUDA Toolkit complements and fully supports programming with OpenACC directives.


The latest version of the CUDA Toolkit is always available at

NVIDIA GPU Computing Registered Developers get early access to the next CUDA Toolkit release, and access to NVIDIA’s online bug reporting and feature request system.