NVIDIA cuBLAS introduces cuBLASDx APIs, device side API extensions for performing BLAS calculations inside your CUDA kernel. Fusing numerical operations decreases the latency and improves the performance of your application.

Refer to the cuBLASDx documentation on hardware and software requirements

TAR local installer instructions (x86):

$ wget https://developer.download.nvidia.com/compute/cublasdx/redist/cublasdx/nvidia-cublasdx-24.01.0.tar.gz

ZIP local installer instructions (x86):