The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a GPU-accelerated version of the complete standard BLAS library that delivers 6x to 17x faster performance than the latest MKL BLAS.
Building on the GPU-accelerated BLAS routines in the cuBLAS library, heterogeneous LAPACK implementations such as CULA Tools and MAGMA are also available.
Key Features
|
Performance
Up to 1 TFLOPS sustained performance and >6x speedup over Intel MKL

NVIDIA cuBLAS Level 3 performance delivers significantly faster results than MKL.
ZGEMM Performance vs Intel MKL

NVIDIA cuBLAS delivers near-peak performance for a wide range of matrix sizes.
cuBLAS Batched GEMM API Improves
Performance of Batches of Small Matrices

Short presentation using an GNU/Octave implementation as an example of using cuBlas
Availability
The cuBLAS library is freely available as part of the CUDA Toolkit at www.nvidia.com/getcuda.
For more information on cuBLAS and other CUDA math libraries:
|




Registered Developers Website
NVDeveloper (old site)