TensorRT Getting Started
TensorRT 8.0: What’s New
TensorRT 8.0 is packed with new features like Quantization aware training for accurate INT8, Sparsity support for leveraging Ampere GPUs, and inference optimizations for Transformer-based networks.
- Achieve accuracy equivalent to FP32 with INT8 precision using Quantization Aware Training
- Support for Sparsity on Ampere GPUs delivers up to 50% higher throughput on Ampere GPUs
- Upto 2x faster inference for transformer based networks like BERT with new compiler optimizations
TensorRT 8.0 will be freely available to members of NVIDIA Developer Program in Q2, 2021
Learn how to apply TensorRT optimizations and deploy a PyTorch model to GPUs.
Build a sample TensorRT application that detects common objects in images from scratch.
Download pre-trained models optimized for TensorRT to get started quickly.
Additional TensorRT Resources
- Real-Time Natural Language Understanding with BERT Using TensorRT (Blog)
- Automatic Speech Recognition with TensorRT (Notebook)
- Accelerating Real-Time Text-to-Speech with you TensorRT (Blog)
- NLU with BERT (Notebook)
- Real Time Text-to-Speech (Sample)
- Neural Machine Translation (NMT) Using A Sequence To Sequence (seq2seq) Model (Sample Code)
- Building An RNN Network Layer By Layer (Sample Code)
Image and Video
- Accelerating Wide and Deep with TensorRT (Blog)
- Movie Recommendation Using Neural Collaborative Filter (NCF) (Sample Code)
- Deep Recommender (Sample Code)
- Intro to Recommenders in TensorRT (Video)
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.