The NVIDIA® CUDA® Toolkit provides a comprehensive development environment for C and C++ developers building GPU-accelerated applications. The CUDA Toolkit includes a compiler for NVIDIA GPUs, math libraries, and tools for debugging and optimizing the performance of your applications. You’ll also find programming guides, user manuals, API reference, and other documentation to help you get started quickly accelerating your application with GPUs.

Learn more about CUDA Toolkit 7.5, check out:

If you find any issues please file a bug (requires membership of the Acclerated Computing Developer Program).

New in CUDA 7.5

16-bit floating point (FP16) data format
  • Store up to 2x larger datasets in GPU memory
  • Reduce memory bandwidth requirements by up to 2x
  • New mixed precision cublasSgemmEX() routine supports 2x larger matrices
New cuSPARSE GEMVI routines
  • Optimized dense matrix x sparse vector routines - ideal for Natural Language Processing
Instruction-level profiling helps pinpoint performance bottlenecks
  • Quickly identify the specific lines of source code limiting the performance of GPU code
  • Apply advanced performance optimizations more easily

The CUDA Toolkit is now available for all developers.


Members of the CUDA Registered Developer Program are notified of the latest developments, able access to pre-release software and can report issues and bugs. Learn More

Learn more about the GPU-accelerated libraries and development tools included in the CUDA Toolkit

If you develop applications in languages other than C or C++, please review the Getting Started Page for a language solution that meets your needs.  The CUDA Toolkit complements and fully supports programming with OpenACC directives.

Review the latest CUDA performance report to learn how much you could accelerate your code.


The latest version of the CUDA Toolkit is always available at

CUDA Registered Developers get early access to the next CUDA Toolkit release, and access to NVIDIA’s online bug reporting and feature request system.