Today, NVIDIA is announcing the availability of cuTENSOR, version 1.4, which supports up to 64-dimensional tensors, distributed multi-GPU tensor operations, and helps improve tensor contraction performance models. This software can be downloaded now free of charge.
Download the cuTENSOR software.
What’s New?
- Supports up to 64-dimensional tensors.
- Supports distributed, multi-GPU tensor operations.
- Improved tensor contraction performance model (i.e.,
algo CUTENSOR_ALGO_DEFAULT
). - Improved performance for tensor contraction that have an overall large contracted dimension (i.e., a parallel reduction was added).
- Improved performance for tensor contraction that have a tiny contracted dimension (<= 8).
- Improved performance for outer-product-like tensor contractions (e.g.,
C[a,b,c,d] = A[b,d] * B[a,c]
). - Additional bug fixes.
For more information, see the cuTENSOR Release Notes.
About cuTENSOR
cuTENSOR is a high-performance CUDA library for tensor primitives; its key features include:
- Extensive mixed-precision support:
FP64
inputs withFP32
compute.FP32
inputs withFP16
,BF16
, orTF32
compute.- Complex-times-real operations.
- Conjugate (without transpose) support.
- Support for up to 64-dimensional tensors.
- Supports arbitrary data layouts.
- Supports trivially serializable data structures.
- Enhancements to main computational routines:
- Direct (i.e., transpose-free) tensor contractions.
- Tensor reductions (including partial reductions).
- Element-wise tensor operations:
- Support for various activation functions.
- Arbitrary tensor permutations.
- Conversion between different data types
Learn more
- On Math Libraries, see Recent Developments in NVIDIA Math Libraries (GTC #S31754).
- For the latest on HPC software, see A Deep Dive into the latest HPC software (GTC #S31286).
- Catch-up on Tensor Core-Accelerated Math Libraries for Dense and Sparse Linear Algebra in AI and HPC (GTC #CWES1098).
- Read technical details in our cuTENSOR Product Documentation.
Recent Developer posts
- On Fortran enhancements to support Tensor Cores, read Bringing Tensor Cores to Standard Fortran.
- Benefit from A100 acceleration and read Getting Immediate Speedups with NVIDIA A100 TF32.
- To gain AI training benefits, see Accelerating AI Training with NVIDIA TF32 Tensor Cores.