NVIDIA Developer Zone

CUBLAS

The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a GPU-accelerated version of the complete standard BLAS library that delivers 6x to 17x faster performance than the latest MKL BLAS.

Building on the GPU-accelerated BLAS routines in the cuBLAS library, heterogeneous LAPACK implementations such as CULA Tools and MAGMA are also available.

Key Features

    
  • Complete support for all 152 standard BLAS routines
  • Single, double, complex, and double complex data types
  • Support for CUDA streams
  • Fortran bindings
  • Support for multiple GPUs and concurrent kernels
  • *New in CUDA 4.1* batched GEMM API , > 4x speedup vs MKL
  • *New in CUDA 4.1* 5% to 10% performance improvement over previous CUDA releases for large GEMMs 

Performance 

Up to 1 TFLOPS sustained performance and >6x speedup over Intel MKL


NVIDIA cuBLAS Level 3 performance delivers significantly faster results than MKL.

 

ZGEMM Performance vs Intel MKL


NVIDIA cuBLAS delivers near-peak performance for a wide range of matrix sizes.

 

cuBLAS Batched GEMM API Improves
Performance of Batches of Small Matrices


 

Short presentation using an GNU/Octave implementation as an example of using cuBlas

Availability

The cuBLAS library is freely available as part of the CUDA Toolkit at www.nvidia.com/getcuda.
For more information on cuBLAS and other CUDA math libraries: