cuBLAS
Dense Linear Algebra on GPUs
The NVIDIA cuBLAS library is a fast GPU-accelerated implementation of the standard basic linear algebra subroutines (BLAS). Using cuBLAS APIs, you can speed up your applications by deploying compute-intensive operations to a single GPU or scale up and distribute work across multi-GPU configurations efficiently.
NVBLAS is a GPU-accelerated version of BLAS that further accelerates BLAS Level-3 routines by dynamically routing BLAS calls to one or more NVIDIA GPUs as well as CPUs in the system through the cuBLAS-XT interface.
Researchers and scientists use cuBLAS for developing gpu-accelerated algorithms in areas including high performance computing, image analysis and machine learning.
Download Now Explore what’s new in the latest release...Performance
Key Features
- Complete support for all 152 standard BLAS routines
- Turing optimized GEMMs and GEMM extensions for Tensor Cores
- GEMM performance tuned for sizes used in various Deep Learning models
- API and error logging for debug and traceability
- Supports single, double, complex, and double complex data types
- Supports half-precision (FP16) and integer (INT8) matrix multiplication operations
- Support for multiple GPUs and concurrent kernels
- Supports CUDA streams for concurrent operations
- Fortran bindings
- Batch processing APIs for high performance GEMM operations, LU factorization, and matrix inverse operations
- Device API that can be called from with your own CUDA kernels
- Fast implementation of TRSV (Triangular solve)
Product Resources
Availability
The cuBLAS library is freely available as part of the CUDA Toolkit and OpenACC Toolkit.
Additional Resources
- cuBLAS code samples:
- cuBLAS examples by Chrzeszczyk, A. and Chrzeszczyk, J.
- CULA Tools – Heterogeneous LAPACK from EM Photonics
- MAGMA – Heterogeneous LAPACK from University of Tennessee, Knoxville