GPUDirect(tm) gives 3rd party devices direct access to CUDA Memory

Support for 16-way concurrency allows up to 16 different kernels to run at the same time on Fermi architecture GPUs

Runtime / Driver interoperability enables applications to mix-n-match use of the CUDA Driver API with CUDA C Runtim and math libraries via buffer sharing and context migration

New language features added to CUDA C / C++ include: Support for printf() in device code Support for function pointers and recursion make it easier to port many existing algorithms to Fermi GPUs

Unified Visual Profiler now supports both CUDA C/C++ and OpenCL, and now includes support for CUDA Driver API tracing

Math Libraries Performance Improvements, including: Improved performance of selected transcendental functions from the log, pow, erf, and gamma families Significant improvements in double-precision FFT performance on Fermi-architecture GPUs for 2^n transform sizes Streaming API now supported in CUBLAS for overlapping copy and compute operations CUFFT Real-to-complex (R2C) and complex-to-real (C2R) optimizations for 2^n data sizes Improved performance for GEMV and SYMV subroutines in CUBLAS Optimized double-precision implementations of divide and reciprocal routines for the Fermi architecture

New and updated SDK code samples demonstrating how to use: Function pointers in CUDA C/C++ kernels OpenCL / Direct3D buffer sharing Hidden Markov Model in OpenCL Microsoft Excel GPGPU example showing how to run an Excel function on the GPU



Note: The developer driver packages below provide baseline support for the widest number of NVIDIA products in the smallest number of installers. More recent production driver packages for developers and end users may be available at www.nvidia.com/drivers.

