TensorRT 8.0: What’s New

TensorRT 8.0 is packed with new features like Transformer Optimizations, Quantization aware training providing accurate INT8, and Sparsity support for leveraging sparse tensor cores on Ampere GPUs.


  • BERT-Large Inference in 1.2 ms with new Transformer Optimizations
  • Achieve accuracy equivalent to FP32 with INT8 precision using Quantization Aware Training
  • Sparsity support for faster inference on Ampere GPUs

TensorRT 8.0 is freely available to members of the NVIDIA Developer Program.

You can find additional resources on the NVIDIA Developer Blog or find other TensorRT developers on the NVIDIA Developer Forum

Introductory Resources

Introductory Blog

Learn how to apply TensorRT optimizations and deploy a PyTorch model to GPUs.

Read Blog

Introductory Webinar

Watch and learn more about TensorRT 8.0 features, and tools that simplify the inference workflow.

Watch Webinar

Pre-Trained Models

Download pre-trained models optimized for TensorRT to get started quickly.

Click Here

Additional TensorRT Resources

Conversational AI

Image and Video

Recommendation Systems

Ethical AI

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.