TensorRT: What’s New
NVIDIA® TensorRT™ 8.5 includes support for new NVIDIA H100 Tensor Core GPUs and reduced memory consumption for TensorRT optimizer and runtime with CUDA® Lazy Loading.
TensorRT 8.5 GA is a free download for members of the NVIDIA Developer Program .
Download NowTorch-TensorRT is now available in the PyTorch container from the NVIDIA NGC™ catalog.
TensorFlow-TensorRT is now available in the TensorFlow container from the NGC catalog.
Explore Ways to Get Started With TensorRT
Beginner
- Getting started with NVIDIA TensorRT (Video)
- Introductory Blog
- Getting started notebooks (Jupyter Notebook)
- Quick Start Guide
Intermediate
- Documentation
- Sample codes (C++)
- BERT, EfficientDet inference using TensorRT (Jupyter Notebook)
- Serving model with NVIDIA Triton™ (Blog, Docs)
Expert
- Using Quantization Aware Training (QAT) with TensorRT (Blog)
- PyTorch-Quantization (QAT) Toolkit (Python Code)
- TensorFlow Quantization Toolkit (Blog)
- Sparsity with TensorRT (Blog)
Beginner
- Getting started with NVIDIA Torch-TensorRT (Video)
- Accelerate Inference up-to 6x in PyTorch (Blog)
- Object detection with SSD (Jupyter Notebook)
Intermediate
- Documentation
- Post-training quantization with Hugging Face BERT (Jupyter Notebook)
- Quantization-aware training (Jupyter Notebook)
- Serving model with Triton (blog, docs)
- Using dynamic shapes (Jupyter Notebook)
Beginner
Intermediate
- Documentation (Sample Code)
- Serving model with Tritonb (Blog, Docs)
- Using dynamic shapes (Jupyter Notebook)
Get Enterprise Support for NVIDIA TensorRT
NVIDIA Enterprise Support for TensorRT, offered with the NVIDIA AI Enterprise software suite, includes:
- Single source of support with service-level agreements for AI deployments
- Security reviews and notifications
- API stability and compatibility across releases
- Access to NVIDIA AI experts
- Long-term support on designated releases
- Customized support-upgrade options

Join the Triton community and stay current on the latest feature updates, bug fixes, and more.
Discover More TensorRT Resources
For Conversational AI
- Real-Time Natural Language Processing With BERT Using TensorRT (Blog)
- Optimizing T5 and GPT-2 for Real-Time Inference With NVIDIA TensorRT (Blog)
- Quantize BERT with PTQ and QAT for INT8 Inference (Sample)
- Automatic Speech Recognition With TensorRT (Notebook)
- How to Deploy Real-Time Text-to-Speech Applications on GPUs Using TensorRT (Blog)
- Natural Language Understanding With BERT Notebook (Jupyter Notebook)
- Real-Time Text-to-Speech (sample)
- Building an RNN Network Layer by Layer (Sample Code)


For Image and Vision
- Optimize Object Detection With EfficientDet and TensorRT 8 (Jupyter Notebook)
- Estimating Depth With ONNX Models and Custom Layers Using NVIDIA TensorRT (Blog)
- Speeding Up Deep Learning Inference Using TensorFlow, ONNX, and TensorRT (Semantic Segmentation Blog)
- Object Detection With SSD, Faster R-CNN Networks (C++ Code Samples)
- Accelerating Inference With Sparsity using Ampere Architecture and TensorRT (Blog)
- Achieving FP32 Accuracy in INT8 using Quantization Aware Training With TensorRT (Blog)
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.