cuBLAS

Dense Linear Algebra on GPUs

The NVIDIA cuBLAS library is a fast GPU-accelerated implementation of the standard basic linear algebra subroutines (BLAS). Using cuBLAS APIs, you can speed up your applications by deploying compute-intensive operations to a single GPU or scale up and distribute work across multi-GPU configurations efficiently.

NVBLAS is a GPU-accelerated version of BLAS that further accelerates BLAS Level-3 routines by dynamically routing BLAS calls to one or more NVIDIA GPUs as well as CPUs in the system through the cuBLAS-XT interface.

Researchers and scientists use cuBLAS for developing gpu-accelerated algorithms in areas including high performance computing, image analysis and machine learning.

Download Now
Explore what’s new in the latest release...

Performance

cuBLAS performs 10X faster than the latest version of the MKL BLAS on common benchmarks


Scalability

cuBLAS-XT API scales to multi-GPU configurations with almost linear performance increase as more GPUs are added


Key Features

  • Complete support for all 152 standard BLAS routines
  • Single, double, complex, and double complex data types
  • Supports half-precision (FP16) and integer (INT8) matrix multiplication operations
  • Support for multiple GPUs and concurrent kernels
  • Supports CUDA streams for concurrent operations
  • Fortran bindings
  • Batch processing APIs for high performance GEMM operations, LU factorization, and matrix inverse operations
  • Device API that can be called from with your own CUDA kernels
  • New implementation of TRSV (Triangular solve), up to 7x faster than previous implementation

Product Resources


Availability

The cuBLAS library is freely available as part of the CUDA Toolkit and OpenACC Toolkit.

Additional Resources