The NVIDIA® CUDA® Toolkit provides a comprehensive development environment for C and C++ developers building GPU-accelerated applications. The CUDA Toolkit includes a compiler for NVIDIA GPUs, math libraries, and tools for debugging and optimizing the performance of your applications. You’ll also find programming guides, user manuals, API reference, and other documentation to help you get started quickly accelerating your application with GPUs.

Learn more about CUDA Toolkit 8.0:

If you find any issues please file a bug (requires membership of the NVIDIA Developer Program).

New in CUDA 8

Pascal Architecture Support
  • Enhance performance out-of-the-box on Pascal GPUs
  • Simplify programming using Unified Memory including support for large datasets, concurrent data access and atomics
  • Optimize Unified Memory performance using new data migration APIs
  • Increase throughput at ultra-fast speeds using NVIDIA® NVLINK, new high-speed interconnect
Developer Tools
  • Identify latent system-level bottlenecks using critical path analysis
  • Improve productivity by up to 2x with faster NVCC compile times
  • Tune OpenACC applications and overall host code using new profiling extensions
  • Accelerate graph analytics algorithms with nvGRAPH
  • Speed-up Deep Learning applications using native support for FP16 and INT8, support for batch operation in cuBLAS


Learn more about the GPU-accelerated libraries and development tools included in the CUDA Toolkit

If you develop applications in languages other than C or C++, please review the Getting Started Page for a language solution that meets your needs.  The CUDA Toolkit complements and fully supports programming with OpenACC directives.


The latest version of the CUDA Toolkit is always available at

CUDA Registered Developers get early access to the next CUDA Toolkit release, and access to NVIDIA’s online bug reporting and feature request system.