Accelerating ReLu and GeLu Activation Functions, and Batched Sparse GEMM in cuSPARSELt v0.2.0

Today, NVIDIA is announcing the availability of cuSPARSELt, version 0.2.0, which increases performance on activation functions, bias vectors, and Batched Sparse GEMM. This software can be downloaded now free of charge.

Download the cuSPARSELt software.

What’s New?

Support for activation functions and bias vector:
- ReLU + upper bound and threshold setting for all kernels.
- GeLU for INT8 I/O, INT32 Tensor Core compute kernels.
Support for Batched Sparse GEMM:
- Single sparse matrix / Multiple dense matrices (Broadcast).
- Multiple sparse and dense matrices.
- Batched bias vector.
Compatibility notes:
- cuSPARSELt does not require the nvrtc library anymore.
- Support for Ubuntu 16.04 (gcc-5) is now deprecated and it will be removed in future releases.

For more technical information, see the cuSPARSELt Release Notes.

About cuSPARSELt

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:

\(D=\alpha op(A)*op(B)+\beta op(C)\)

In this equation, \(op(A)\) and \(op(B)\) refer to in-place operations such as transpose and nontranspose.

The cuSPARSELt APIs provide flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

Key Features

NVIDIA Sparse MMA Tensor Core support.
Mixed-precision computation support:
- FP16 I/O, FP32 Tensor Core accumulate.
- BFLOAT16 I/O, FP32 Tensor Core accumulate.
- INT8 I/O, INT32 Tensor Core compute.
- FP32 I/O, TF32 Tensor Core compute.
- TF32 I/O, TF32 Tensor Core compute.