GTC Silicon Valley-2019 ID:S9176:Tensor Core Performance and Precision
Josef Schule(Technische Universität Kaiserslautern)
Learn about using Tensor Cores to perform very fast matrix multiply-accumulate steps like those required in AI training. The key to Tensor Core performance is the use of 16-bit floating point arithmetic, but that causes significant rounding errors. Although algorithms like binomial correction or Karatsuba can reduce rounding errors considerably, they require additional calculations. We'll detail performance of these algorithms based on the Warp Matrix Multiply Accumulate API.