Tag: CUDA 8

Accelerated Computing

Pro Tip: cuBLAS Strided Batched Matrix Multiply

There’s a new computational workhorse in town. For decades, general matrix-matrix multiply—known as GEMM in Basic Linear Algebra Subroutines (BLAS) libraries… 10 MIN READ
Accelerated Computing

Beyond GPU Memory Limits with Unified Memory on Pascal

Unified Memory on NVIDIA Pascal GPUs enables applications to run out-of-the-box with larger memory footprints and achieve great baseline performance. 20 MIN READ
Accelerated Computing

New Compiler Features in CUDA 8

CUDA 8 is one of the most significant updates in the history of the CUDA platform. In addition to Unified Memory and the many new API and library features in… 17 MIN READ