TensorRT: What’s New

NVIDIA® TensorRT-LLM greatly speeds optimization of large language models (LLMs). Leveraging TensorRT™, FasterTransformer, and more, TensorRT-LLM accelerates LLMs via targeted optimizations like Flash Attention, Inflight Batching, and FP8 in an open-source Python API, enabling developers to get optimal inference performance on GPUs.

NVIDIA TensorRT 8.6 improves cross-compatibility between GPUs and software stacks, making TensorRT more versatile across hardware deployments and upgrades.

TensorRT 8.6 GA is a free download for members of the NVIDIA Developer Program.

Download Now Documentation

Ways to Get Started With NVIDIA TensorRT

TensorRT and TensorRT-LLM are available on multiple platforms for free for development or you can purchase NVIDIA AI Enterprise, an end-to-end AI software platform that includes TensorRT and TensorRT-LLM, for mission-critical AI inference with enterprise-grade security, stability, manageability, and support. Contact sales or apply for a 90-day NVIDIA AI Enterprise evaluation license to get started.

TensorRT

TensorRT is available to download for free as a binary on multiple different platforms or as a container on NVIDIA NGC™.

Download Now Pull Container From NGC Documentation

Beginner

Getting started with NVIDIA TensorRT (video)
Introductory blog
Getting started notebooks (Jupyter Notebook)
Quick-start guide

Intermediate

Sample code (C++)
BERT, EfficientDet inference using TensorRT (Jupyter Notebook)
Serving model with NVIDIA Triton™ (blog, docs)

Expert

Using quantization aware training (QAT) with TensorRT (blog)
PyTorch-quantization toolkit (Python code)
TensorFlow quantization toolkit (blog)
Sparsity with TensorRT (blog)

TensorRT-LLM

TensorRT-LLM is available for free on GitHub.

Download Now Documentation

Beginner

Intermediate

Sample code (Python)
Performance benchmarks

Ways to Get Started With NVIDIA TensorRT Frameworks

Torch-TensorRT and TensorFlow-TensorRT are available for free as containers on the NGC catalog or you can purchase NVIDIA AI Enterprise for mission-critical AI inference with enterprise-grade security, stability, manageability, and support. Contact sales or apply for a 90-day NVIDIA AI Enterprise evaluation license to get started.

Torch-TensorRT

Torch-TensorRT is available in the PyTorch container from the NGC catalog.

Pull Container From NGC Documentation

Beginner

Getting started with NVIDIA Torch-TensorRT (video)
Accelerate inference up to 6X in PyTorch (blog)
Object detection with SSD (Jupyter Notebook)

Intermediate

Post-training quantization with Hugging Face BERT (Jupyter Notebook)
Quantization aware training (Jupyter Notebook)
Serving model with Triton (blog, docs)
Using dynamic shapes (Jupyter Notebook)

TensorFlow-TensorRT

TensorFlow-TensorRT is available in the TensorFlow container from the NGC catalog.

Pull Container From NGC Documentation

Beginner

Getting started with TensorFlow-TensorRT (video)
Leverage TF-TRT Integration for Low-Latency Inference (blog)
Image classification with TF-TRT (video)
Quantization with TF-TRT (sample code)

Intermediate

Serving model with Triton (blog, docs)
Using dynamic shapes (Jupyter Notebook)

Explore More TensorRT Resources

TensorRT: What’s New

Ways to Get Started With NVIDIA TensorRT

TensorRT

Beginner

Intermediate

Expert

TensorRT-LLM

Beginner

Intermediate

Ways to Get Started With NVIDIA TensorRT Frameworks

Torch-TensorRT

Beginner

Intermediate

TensorFlow-TensorRT

Beginner

Intermediate

Explore More TensorRT Resources

Large Language Models

Conversational AI

Image and Vision