The NVIDIA® CUDA® Toolkit provides a comprehensive development environment for C and C++ developers building GPU-accelerated applications. The CUDA Toolkit includes a compiler for NVIDIA GPUs, math libraries, and tools for debugging and optimizing the performance of your applications. You’ll also find programming guides, user manuals, API reference, and other documentation to help you get started quickly accelerating your application with GPUs.

Dramatically simplify parallel programming

*New* 64-bit ARM Support
  • Develop or recompile your applications to run on 64-bit ARM systems with NVIDIA GPUs.
Unified Memory
  • Enabling applications to access CPU and GPU memory without the need to manually copy data learn more.
Drop-in Libraries and cuFFT callbacks
  • Automatically accelerate applications’ BLAS and FFTW calculations.
  • Use cuFFT callbacks for higher performance.
Multi-GPU scaling
  • cublasXT - a new BLAS GPU library that automatically scales performance across up to 8 GPUs in a single node, and supporting larger workloads. The re-designed FFT GPU library scales up to 2 GPUs in a single node, allowing larger transform sizes and higher throughput.
*New* Improved Tools Support
  • Support for Microsoft Visual Studio 2013
  • Improved debugging for CUDA FORTRAN
  • Replay feature in Visual Profiler and nvprof
  • nvprune utiliy to optimize the size of object files
Download the CUDA 6.5 Toolkit today.

Check out the CUDA 7 Features and Overview Webinar Recording and Thrust 1.8 in CUDA 7 Webinar Recording

Productivity and Performance Improvements

C++11 support makes it easier for C++ developers to accelerate their applications
  • Write less code with ‘auto’ and ‘lambda’, especially when using the Thrust template library.
New cuSOLVER library of dense and sparse direct solvers
  • Significant acceleration for Computer Vision, CFD, Computational Chemistry, and Linear Optimization applications.
  • Key LAPACK dense solvers 3-6x faster than MKL.
    • Dense solvers include Cholesky, LU, SVD and QR
  • Sparse direct solvers 2-14x faster than CPU-only equivalents.
    • Sparse solvers include direct solvers and eigensolvers
Runtime Compilation enables highly optimized kernels to be generated at runtime.
  • Improve performance by removing conditional logic and only evaluating special cases when necessary.
Registered developers can now download CUDA 7 RC. Become a CUDA Registered Developer today.
Members of the CUDA Registered Developer Program get notified of the latest developments, get access to pre-release software and can report issues and file bugs
Learn More

Learn more about the GPU-accelerated libraries and development tools included in the CUDA Toolkit

GPU-Accelerated Libraries
  • cuFFT – Fast Fourier Transforms Library
  • cuBLAS – Complete BLAS library
  • cuSPARSE – Sparse Matrix library
  • cuRAND – Random Number Generator
  • NPP – Thousands of Performance Primitives for Image & Video Processing
  • Thrust – Templated Parallel Algorithms & Data Structures
  • CUDA Math Library of high performance math routines
Development Tools

If you develop applications in languages other than C or C++, please review the Getting Started Page for a language solution that meets your needs.  The CUDA Toolkit complements and fully supports programming with OpenACC directives.


The latest version of the CUDA Toolkit is always available at

NVIDIA GPU Computing Registered Developers get early access to the next CUDA Toolkit release, and access to NVIDIA’s online bug reporting and feature request system.