cuTENSOR
Tensor Linear Algebra on NVIDIA GPUs
NVIDIA cuTENSOR is a GPU-accelerated tensor linear algebra library for tensor contraction, reduction, and elementwise operations. Using cuTENSOR, applications can harness the specialized tensor cores on NVIDIA GPUs for high-performance tensor computations and accelerate deep learning training and inference, computer vision, quantum chemistry, and computational physics workloads.
Download
Resources:
cuTENSOR 2.0 Available Now
cuTENSOR 2.0 offers new features, such as just-in-time compiled kernels for tensor contraction, that significantly boost performance. The library’s APIs also have been made uniform to help make features easily extensible for all operations.
cuTENSOR 2.0 is a more efficient and flexible library to accelerate your applications at the intersection of AI and HPC.
Read the cuTENSOR 2.0 migration guide
cuTENSOR Performance
The cuTENSOR library is highly optimized for performance on NVIDIA GPUs with support for DMMA, TF32, and now 3xTF32.
cuTENSOR 2.0 achieves significant performance gains over cuTENSOR 1.7, even before enabling just-in-time compiled kernels.
Just-in-time compiled kernels for tensor contraction enables speedups in tensor software benchmarks, including in rand1000.
cuTENSOR Key Features
- Just-in-time compiled kernels for tensor contraction
- Plan-based multi-stage APIs for all operations
- Support for arbitrarily dimensional tensor descriptors
- Support for 3xTF32 compute type
- Support for int64 extents
- Tensor contraction, reduction, and elementwise operations
- Mixed precision support
- Expressive API allowing elementwise operation fusion