NVIDIA Announces TensorRT 8.2 and Integrations with PyTorch and TensorFlow

Discuss (0)
Diagram of Torch-TensorRT and TensorFlow-TensorRT.

Today NVIDIA released TensorRT 8.2, with optimizations for billion parameter NLU models. These include T5 and GPT-2, used for translation and text generation, making it possible to run NLU apps in real time.

TensorRT is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for AI applications. TensorRT is used across several industries including healthcare, automotive, manufacturing, internet/telecom services, financial services, and energy.

PyTorch and TensorFlow are the most popular deep learning frameworks having millions of users. The new TensorRT framework integrations now provide a simple API in PyTorch and TensorFlow with powerful FP16 and INT8 optimizations to accelerate inference by up to 6x.

Highlights include

  • TensorRT 8.2: Optimizations for T5 and GPT-2 run real-time translation and summarization with 21x faster performance compared to CPUs. 
  • TensorRT 8.2: Simple Python API for developers using Windows. 
  • Torch-TensorRT: Integration for PyTorch delivers up to 6x performance vs in-framework inference on GPUs with just one line of code.
  • TensorFlow-TensorRT: Integration of TensorFlow with TensorRT delivers up to 6x faster performance compared to in-framework inference on GPUs with one line of code.


  • Torch-TensorRT is available today in the PyTorch Container from the NGC catalog.
  • TensorFlow-TensorRT is available today in the TensorFlow Container from the NGC catalog.
  • TensorRT is freely available to members of the NVIDIA Developer Program
  • Learn more on the TensorRT product page.

Learn more