NVIDIA TensorRT 11.x Download

Status message

Thank you for downloading. Here is our quickstart guide to help get you started.

NVIDIA TensorRT is a platform for high performance deep learning inference.

TensorRT works across all NVIDIA GPUs using the CUDA platform.

Please review TensorRT online documentation for more information, including the installation guide.

Please review and accept the license agreement to proceed to download the software.

I Agree To the Terms of the NVIDIA TensorRT License Agreement

Please download the version compatible with your development environment.

TensorRT 11.2.1 GA Release date 2026/07/30

Announcements

Platform dependency upgrades: Updates internal build dependencies for TensorRT 11.2.1. The CUDA Toolkit baseline remains CUDA 13.3 (unchanged from TensorRT 11.1.0; Debian, RPM, tar, and zip packages ship against CUDA 13.3 update 1). Refer to the TensorRT Support Matrix for per-platform CUDA version pinning and to Prerequisites for installer prerequisites.
GridSample 3D support: Extends GridSample from 2D-only to 3D (rank-5 input) with FP32, FP16, and BF16; linear, nearest, and cubic interpolation; align_corners 0/1; and padding modes zeros, border, and reflection.

Key features and enhancements in this TensorRT release

ONNX DFT operator support: Adds a cuFFT-based plugin for the ONNX DFT operator, including forward and inverse C2C, R2C, and C2R transforms.
PluginV2 to PluginV3 migration sample: Adds a Python sample that demonstrates how to migrate a TensorRT PluginV2 to PluginV3, and the mappings from V2 to V3 plugin methods. Refer to Plugin API Description and the Sample Explorer.
Improved CMake support: Tar and Zip packages now include CMake configuration files under the cmake directory. Import TensorRT with find_package(TensorRT-), where is one of Enterprise, Automotive, RTX, or SafeInference. Prefer this import path over manually wiring include and library paths in CMake projects.

Please download the version compatible with your development environment using the links below.

Documentation

Online Documentation

TensorRT 11.2.1 GA for x86_64 Architecture

Debian, RPM, and TAR Install Packages for Linux

TensorRT 11.2.1 GA for Linux x86_64 and CUDA 12.0 to 12.9 TAR Package
TensorRT 11.2.1 GA for Ubuntu 22.04 and CUDA 12.0 to 12.9 DEB local repo Package
TensorRT 11.2.1 GA for Ubuntu 24.04 and CUDA 12.0 to 12.9 DEB local repo Package
TensorRT 11.2.1 GA for Debian 12 and CUDA 12.0 to 12.9 DEB local repo Package
TensorRT 11.2.1 GA for RedHat / Rocky Linux 8 and CUDA 12.0 to 12.9 RPM local repo Package
TensorRT 11.2.1 GA for RedHat / Rocky Linux 9 and CUDA 12.0 to 12.9 RPM local repo Package
TensorRT 11.2.1 GA for Linux x86_64 and CUDA 13.0 to 13.3 TAR Package
TensorRT 11.2.1 GA for Ubuntu 22.04 and CUDA 13.0 to 13.3 DEB local repo Package
TensorRT 11.2.1 GA for Ubuntu 24.04 and CUDA 13.0 to 13.3 DEB local repo Package
TensorRT 11.2.1 GA for Ubuntu 26.04 and CUDA 13.0 to 13.3 DEB local repo Package
TensorRT 11.2.1 GA for Debian 12 and CUDA 13.0 to 13.3 DEB local repo Package
TensorRT 11.2.1 GA for RedHat / Rocky Linux 8 and CUDA 13.0 to 13.3 RPM local repo Package
TensorRT 11.2.1 GA for RedHat / Rocky Linux 9 and CUDA 13.0 to 13.3 RPM local repo Package
TensorRT 11.2.1 GA for RedHat / Rocky Linux 10 and CUDA 13.0 to 13.3 RPM local repo Package

Zip Packages for Windows

TensorRT 11.2.1 GA for Windows 10, 11, Server 2022 and CUDA 12.0 to 12.9 ZIP Package
TensorRT 11.2.1 GA for Windows 10, 11, Server 2022 and CUDA 13.0 to 13.3 ZIP Package

TensorRT 11.2.1 GA for ARM SBSA

Debian and TAR Install Packages for Linux

TensorRT 11.2.1 GA for Linux SBSA and CUDA 13.3 TAR Package
TensorRT 11.2.1 GA for Ubuntu 24.04 and CUDA 13.3 DEB local repo Package
TensorRT 11.2.1 GA for Ubuntu 24.04 and CUDA 13.3 DEB cross local repo Package
TensorRT 11.2.1 GA for Ubuntu 26.04 and CUDA 13.3 DEB local repo Package
TensorRT 11.2.1 GA for Ubuntu 26.04 and CUDA 13.3 DEB cross local repo Package
TensorRT 11.2.1 GA for Debian 12 and CUDA 13.3 DEB local repo Package
TensorRT 11.2.1 GA for Debian 12 and CUDA 13.3 DEB cross local repo Package

TensorRT 11.1.0 GA Release date 2026/06/16

Announcements

CUDA 13.3 dependency upgrade: Updates the CUDA Toolkit baseline to CUDA 13.3 across Linux x86-64, Windows x64, and SBSA platforms. Refer to the TensorRT Support Matrix for the per-platform CUDA version pinning and to Required Software for installer prerequisites.
Ubuntu 26.04 support: Adds Ubuntu 26.04 LTS to the supported Linux x86-64 and SBSA platform lists alongside the existing Ubuntu 22.04/24.04 packages. Refer to Debian Package Installation and Tar File Installation for the full Linux x86-64 and SBSA distribution lists and to the TensorRT Support Matrixfor the per-platform compiler, glibc, and Python tuples.
Python 3.14 bindings: Extends the Python wheel support matrix to Python 3.14 on supported platforms. Refer to Python Package Index (pip) for installing the Python 3.14 wheel and to the TensorRT Support Matrix for the per-platform Python version table.

Key features and enhancements in this TensorRT release

MoE (Mixture of Experts)

NVFP4 dual-GEMM fusion (gate + up projection) for SM121: Fuses the gate and up projection GEMMs in NVFP4 MoE/MLP blocks on NVIDIA DGX Spark (compute capability 12.1). For more information, refer to the MoE (Mixture of Experts) section.

Performance

Global Performance Tuner: Adds automated end-to-end performance tuning via build-route searching through trtexec to explore internal builder knobs, benchmark candidate engines, and optionally validate accuracy before selecting the fastest valid route. Refer to Global Performance Tuning.

Please download the version compatible with your development environment using the links below.

Documentation

Online Documentation

TensorRT 11.1.0 GA for x86_64 Architecture

Debian, RPM, and TAR Install Packages for Linux

TensorRT 11.1.0 GA for Linux x86_64 and CUDA 12.0 to 12.9 TAR Package
TensorRT 11.1.0 GA for Ubuntu 22.04 and CUDA 12.0 to 12.9 DEB local repo Package
TensorRT 11.1.0 GA for Ubuntu 24.04 and CUDA 12.0 to 12.9 DEB local repo Package
TensorRT 11.1.0 GA for Debian 12 and CUDA 12.0 to 12.9 DEB local repo Package
TensorRT 11.1.0 GA for RedHat / Rocky Linux 8 and CUDA 12.0 to 12.9 RPM local repo Package
TensorRT 11.1.0 GA for RedHat / Rocky Linux 9 and CUDA 12.0 to 12.9 RPM local repo Package
TensorRT 11.1.0 GA for Linux x86_64 and CUDA 13.0 to 13.3 TAR Package
TensorRT 11.1.0 GA for Ubuntu 22.04 and CUDA 13.0 to 13.3 DEB local repo Package
TensorRT 11.1.0 GA for Ubuntu 24.04 and CUDA 13.0 to 13.3 DEB local repo Package
TensorRT 11.1.0 GA for Ubuntu 26.04 and CUDA 13.0 to 13.3 DEB local repo Package
TensorRT 11.1.0 GA for Debian 12 and CUDA 13.0 to 13.3 DEB local repo Package
TensorRT 11.1.0 GA for RedHat / Rocky Linux 8 and CUDA 13.0 to 13.3 RPM local repo Package
TensorRT 11.1.0 GA for RedHat / Rocky Linux 9 and CUDA 13.0 to 13.3 RPM local repo Package
TensorRT 11.1.0 GA for RedHat / Rocky Linux 10 and CUDA 13.0 to 13.3 RPM local repo Package

Zip Packages for Windows

TensorRT 11.1.0 GA for Windows 10, 11, Server 2022 and CUDA 12.0 to 12.9 ZIP Package
TensorRT 11.1.0 GA for Windows 10, 11, Server 2022 and CUDA 13.0 to 13.3 ZIP Package

TensorRT 11.1.0 GA for ARM SBSA

Debian and TAR Install Packages for Linux

TensorRT 11.1.0 GA for Linux SBSA and CUDA 13.3 TAR Package
TensorRT 11.1.0 GA for Ubuntu 24.04 and CUDA 13.3 DEB local repo Package
TensorRT 11.1.0 GA for Ubuntu 24.04 and CUDA 13.3 DEB cross local repo Package
TensorRT 11.1.0 GA for Ubuntu 26.04 and CUDA 13.3 DEB local repo Package
TensorRT 11.1.0 GA for Ubuntu 26.04 and CUDA 13.3 DEB cross local repo Package
TensorRT 11.1.0 GA for Debian 12 and CUDA 13.3 DEB local repo Package
TensorRT 11.1.0 GA for Debian 12 and CUDA 13.3 DEB cross local repo Package

TensorRT 11.0.0 GA Release date 2026/05/26

Announcements

RHEL 10 / Rocky Linux 10 support: RPM and tar packages are now available for Red Hat Enterprise Linux 10 and Rocky Linux 10.
New Migration Guide content: The Migration Guide now includes a complete TensorRT 10.x to 11.x migration path with chapters covers the C++ API, Python API, trtexec, Safety Runtime, IEngineInspector JSON output changes, and platform-specific guidance for NVIDIA DriveOS and Jetson/JetPack.
Strongly typed networks are now the default: createNetworkV2() produces a strongly typed network by default in TensorRT 11.0.0. Weak typing is no longer supported. The optimizer infers intermediate tensor types from the network input types and operator specifications and adheres to them strictly. See Strongly Typed Networks and the NVIDIA TensorRT Migration Guide for the upgrade path from weak typing.
Package naming change: Tar and zip archive filenames have been restructured in TensorRT 11.x. Update any download scripts or CI pipelines that reference the old naming convention.

Format	Filename pattern
Tar (10.x)	TensorRT-<version>.<os>.<arch>-gnu.cuda-<cuda_version>.tar.gz
Tar (11.x)	TensorRT-<product>-<product_version>-<os>-<arch>-cuda-<cuda_version>-Release-external.tar.zst
Zip (10.x)	TensorRT-<version>.<os>.<arch>.cuda-<cudaver>.zip
Zip (11.x)	TensorRT-<product>-<product_version>-<os>-<arch>-cuda-<cuda_version>-Release-external.zip

The TensorRT static libraries have been removed. If you are using the static libraries for building your application, migrate to building your application with the shared libraries. The following library files have been removed in TensorRT 11.0.
- libnvinfer_static.a
- libnvinfer_plugin_static.a
- libnvinfer_lean_static.a
- libnvinfer_dispatch_static.a
- libnvinfer_vc_plugin_static.a
- libnvonnxparser_static.a
- libonnx_proto.a

Key features and enhancements in this TensorRT release

Transformer Inference

Ragged batching for IAttention and IKVCacheUpdateLayer: IAttention now supports packed (ragged) query and key/value tensors via setQueryForm and setKeyValueForm with the kPACKED_NHD layout, allowing variable-length sequences to be concatenated end-to-end without padding to the longest sequence in the batch. Per-sequence lengths are supplied via setQueryLengths and setKeyValueLengths. IKVCacheUpdateLayer similarly supports packed updates via setUpdateForm and setUpdateLengths. For more information, refer to the Fused Attention section.

MoE (Mixture of Experts)

Backend performance improvements for MoE inference: Builds on the MoE inference capability introduced in TensorRT 10.16 with significant backend optimizations that close the performance gap between TensorRT's out-of-the-box MoE inference and specialized external MoE kernels, particularly on Blackwell (SM10x and SM110) hardware. The previous guidance to keep token counts low (seqLen ≤ 16) no longer applies; MoE inference now performs well across a much broader range of token counts. For more information, refer to the MoE (Mixture of Experts) section.

Multi-Device Inference

Multi-Device Inference is now fully supported: In TensorRT 10.16, Multi-Device Inference was a preview feature that required manually enabling the PreviewFeature::kMULTIDEVICE_RUNTIME_10_16 flag in the builder config. Starting in TensorRT 11.0, this feature is fully supported and the flag is no longer needed. For more details, refer to the Multi-Device Inference section.
Expanded Distributed Collective Operations: Optimized distributed workloads by introducing new collective operations to the IDistCollectiveLayer, specifically adding AllToAll, Gather, and Scatter.
Improved NCCL Library Discovery: Implemented an automatic fallback mechanism for loading the NCCL library to increase environment compatibility and deployment flexibility. The runtime now checks for libnccl.so.2 before seamlessly falling back to libnccl.so, preventing initialization failures caused by strictly requiring a single specific filename.
New Context-Parallel Attention Python Sample: Added a new Python sample (attention-mdtrt) that demonstrates context-parallel attention that splits KV sequences across GPUs. For more information, refer to the Working with Transformers section.

API Enhancements

Internal Library Path API: Added nvinfer1::setInternalLibraryPath C++ API to set the path for internal builder resource libraries (libnvinfer_builder_resource_*.so) when they are not in the system path. For more information, refer to the Set Internal Library Path API section.
IAttention causal mask orientation control: Introduced the CausalMaskKind enum and IAttention::setCausalKind / IAttention::getCausalKind APIs to let users specify causal mask alignment (for example, kUPPER_LEFT or kNONE) without providing an explicit mask tensor. This enables clearer and more flexible configuration of causal masking behavior in addAttention.
API Change Tracking: To view API changes between releases, refer to the TensorRT GitHub repository and use the compare tool.

Open-Source Components

New TopK V3 plugin for large K values: A new IPluginV3 TopK plugin has been added to the TensorRT OSS components. The plugin ports the AIR TopK kernel from TensorRT-LLM and supports significantly larger K values than the native TensorRT kernel for the ONNX TopK operator. In 11.0.0, the plugin is OSS-only. The core TensorRT builder does not automatically fall back to it for large K values, so you must add the plugin to your network explicitly. Automatic fallback in the core TensorRT path may be added in a subsequent release.
For changes to TensorRT open-source components, including samples, plugins, and parsers, refer to the TensorRT GitHub Releases page.

Documentation

Rewritten Best Practices and Benchmarking guide: The Best Practices landing page now frames performance work as a measure-then-optimize feedback loop, and the Performance Benchmarking chapter has been rewritten to cover both ONNX-TRT (trtexec) and Torch-TRT workflows side by side in synchronized tabs. New or expanded coverage includes benchmarking basics, ModelOpt quantization, dynamic shapes, CUDA graphs, real input values, layer information and per-layer runtime, serialized engines and timing caches, built-in TensorRT profiling, and reading the Nsight Systems Timeline View.

Please download the version compatible with your development environment using the links below.

Documentation

Online Documentation

TensorRT 11.0.0 GA for x86_64 Architecture

Debian, RPM, and TAR Install Packages for Linux

TensorRT 11.0.0 GA for Linux x86_64 and CUDA 12.0 to 12.9 TAR Package
TensorRT 11.0.0 GA for Ubuntu 22.04 and CUDA 12.0 to 12.9 DEB local repo Package
TensorRT 11.0.0 GA for Ubuntu 24.04 and CUDA 12.0 to 12.9 DEB local repo Package
TensorRT 11.0.0 GA for Debian 12 and CUDA 12.0 to 12.9 DEB local repo Package
TensorRT 11.0.0 GA for RedHat / Rocky Linux 8 and CUDA 12.0 to 12.9 RPM local repo Package
TensorRT 11.0.0 GA for RedHat / Rocky Linux 9 and CUDA 12.0 to 12.9 RPM local repo Package
TensorRT 11.0.0 GA for Linux x86_64 and CUDA 13.0 to 13.2 TAR Package
TensorRT 11.0.0 GA for Ubuntu 22.04 and CUDA 13.0 to 13.2 DEB local repo Package
TensorRT 11.0.0 GA for Ubuntu 24.04 and CUDA 13.0 to 13.2 DEB local repo Package
TensorRT 11.0.0 GA for Debian 12 and CUDA 13.0 to 13.2 DEB local repo Package
TensorRT 11.0.0 GA for RedHat / Rocky Linux 8 and CUDA 13.0 to 13.2 RPM local repo Package
TensorRT 11.0.0 GA for RedHat / Rocky Linux 9 and CUDA 13.0 to 13.2 RPM local repo Package
TensorRT 11.0.0 GA for RedHat / Rocky Linux 10 and CUDA 13.0 to 13.2 RPM local repo Package

Zip Packages for Windows

TensorRT 11.0.0 GA for Windows 10, 11, Server 2022 and CUDA 12.0 to 12.9 ZIP Package
TensorRT 11.0.0 GA for Windows 10, 11, Server 2022 and CUDA 13.0 to 13.2 ZIP Package

TensorRT 11.0.0 GA for ARM SBSA

Debian and TAR Install Packages for Linux

TensorRT 11.0.0 GA for Linux SBSA and CUDA 13.2 TAR Package
TensorRT 11.0.0 GA for Ubuntu 24.04 and CUDA 13.2 DEB local repo Package
TensorRT 11.0.0 GA for Ubuntu 24.04 and CUDA 13.2 DEB cross local repo Package
TensorRT 11.0.0 GA for Debian 12 and CUDA 13.2 DEB local repo Package
TensorRT 11.0.0 GA for Debian 12 and CUDA 13.2 DEB cross local repo Package

TensorRT is also available on the following NVIDIA GPU platforms:

NVIDIA NIM for developing AI-powered enterprise applications and deploying AI models in production
NVIDIA GPU Cloud (NGC) TensorRT Container for cloud deployment
NVIDIA Jetpack for Jetson Orin embedded platforms
NVIDIA DRIVE® Install for NVIDIA DRIVE autonomous driving platform (access requires membership of the NVIDIA Drive Developer Program)

Ethical AI

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.