|
|

Last Updated:
02
/
08
/
2010
CUDA 2.3 Downloads Click here to view all CUDA Toolkit releases
Release Highlights
- The CUFFT Library now supports double-precision transforms and includes
significant performance improvements for single-precision transforms as
well. See the CUDA Toolkit release notes for details.
- The cuda-gdb hardware debugger and CUDA Visual Profiler are now
included in the CUDA Toolkit installer, and the CUDA-GDB debugger is
now available for all supported Linux distros.
- Each GPU in an SLI group is now enumerated individually, so compute
applications can now take advantage of multi-GPU performance even when
SLI is enabled for graphics.
- The 64-bit versions of the CUDA Toolkit now support compiling 32-bit
applications. Please note that the installation location of the
libraries has changed, so developers on 64-bit Linux must update
their LD_LIBRARY_PATH to contain either /usr/local/cuda/lib or
/usr/local/cuda/lib64.
- New support for fp16/fp32 conversion intrinsics allows storage of
data in fp16 format with computation in fp32. Use of fp16 format is
ideal for applications that require higher numerical range than 16-bit
integer but less precision than fp32 and reduces memory space and
bandwidth consumption.
- The Visual Profiler includes several enhancements:
- All memory transfer API calls are now reported
- Support for profiling multiple contexts per GPU
- Synchronized clocks for requested start time on the CPU and start/end
times on the GPU for all kernel launches and memory transfers
- Global memory load and store efficiency metrics for GPUs with
compute capability 1.2 and higher
- The CUDA Driver for MacOS now has it's own installer, and is available separate from the CUDA
Toolkit.
- Support for major Linux distros, MacOS X, and Windows:
- MacOS X 10.5.6 and later (32-bit)
- Windows XP/Vista/7 with Visual Studio 8 (VC2005 SP1) and 9 (VC2008)
- Fedora 10, RHEL 4.7 & 5.3, SLED 10.2 & 11.0, OpenSUSE 11.1, and Ubuntu 8.10 & 9.04
New CUDA SDK code samples:
- A new pitchLinearTexure code sample that shows how to efficiently
texture from pitch linear memory.
- A new PTXJIT code sample illustrating how to use cuModuleLoadDataEx()
to load PTX source from memory instead of loading a file.
- Two new code samples for Windows, showing how to use the NVCUVID
library to decode MPEG-2, VC-1, and H.264 content and pass frames
to OpenGL or Direct3D for display.
- Updated code samples showing how to properly align CUDA kernel
function parameters so the same code works on both x32 and x64
systems.
|
|
  |