Developer Blog

AI / Deep Learning | HPC |

cuSPARSELt v0.1.0 Now Available: Arm and Windows Support

Today, NVIDIA is announcing the availability of cuSPARSELt version 0.1.0. This software can be downloaded now free for members of the NVIDIA Developer Program.

Download Now

What’s New

  • Support for Window 10 (x86_64)
  • Support for Linux ARM
  • Introduced SM 8.6 Compatibility
  • Support for TF32 compute type
  • Better performance for SM 8.0 kernels (up to 90% SOL)
  • Position independent sparseA / sparseB
  • New APIs for compression and pruning
    • Decoupled from cusparseLtMatmulPlan_t

See the cuSPARSELt Release Notes for more information

About cuSPARSELt

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:

D=\alpha op(A) \cdot op(B) + \beta op(C)

In this formula, op(A) and op(B) refer to in-place operations such as transpose/non-transpose.

The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

Key features:

  • NVIDIA Sparse MMA tensor core support
  • Mixed-precision computation support:
    • FP16 input/output, FP32 Tensor Core accumulate
    • BFLOAT16 input/output, FP32 Tensor Core accumulate
    • INT8 input/output, INT32 Tensor Core compute
    • FP32 input/output, TF32 Tensor Core compute
    • TF32 input/output, TF32 Tensor Core compute
  • Matrix pruning and compression functionalities
  • Auto-tuning functionality (see cusparseLtMatmulSearch())

Learn more:

Recent Developer Blog posts: