NVIDIA TensorRT 11.x Download
NVIDIA TensorRT is a platform for high performance deep learning inference.
TensorRT works across all NVIDIA GPUs using the CUDA platform.
Please review TensorRT online documentation for more information, including the installation guide.
Please review and accept the license agreement to proceed to download the software.
Please download the version compatible with your development environment.
Announcements
- RHEL 10 / Rocky Linux 10 support: RPM and tar packages are now available for Red Hat Enterprise Linux 10 and Rocky Linux 10.
- New Migration Guide content: The Migration Guide now includes a complete TensorRT 10.x to 11.x migration path with chapters covers the C++ API, Python API, trtexec, Safety Runtime, IEngineInspector JSON output changes, and platform-specific guidance for NVIDIA DriveOS and Jetson/JetPack.
- Strongly typed networks are now the default: createNetworkV2() produces a strongly typed network by default in TensorRT 11.0.0. Weak typing is no longer supported. The optimizer infers intermediate tensor types from the network input types and operator specifications and adheres to them strictly. See Strongly Typed Networks and the NVIDIA TensorRT Migration Guide for the upgrade path from weak typing.
- Package naming change: Tar and zip archive filenames have been restructured in TensorRT 11.x. Update any download scripts or CI pipelines that reference the old naming convention.
- The TensorRT static libraries have been removed. If you are using the static libraries for building your application, migrate to building your application with the shared libraries. The following library files have been removed in TensorRT 11.0.
- libnvinfer_static.a
- libnvinfer_plugin_static.a
- libnvinfer_lean_static.a
- libnvinfer_dispatch_static.a
- libnvinfer_vc_plugin_static.a
- libnvonnxparser_static.a
- libonnx_proto.a
| Format | Filename pattern |
|---|---|
| Tar (10.x) | TensorRT-<version>.<os>.<arch>-gnu.cuda-<cuda_version>.tar.gz |
| Tar (11.x) | TensorRT-<product>-<product_version>-<os>-<arch>-cuda-<cuda_version>-Release-external.tar.zst |
| Zip (10.x) | TensorRT-<version>.<os>.<arch>.cuda-<cudaver>.zip |
| Zip (11.x) | TensorRT-<product>-<product_version>-<os>-<arch>-cuda-<cuda_version>-Release-external.zip |
Key features and enhancements in this TensorRT release
Transformer Inference
- Ragged batching for IAttention and IKVCacheUpdateLayer: IAttention now supports packed (ragged) query and key/value tensors via setQueryForm and setKeyValueForm with the kPACKED_NHD layout, allowing variable-length sequences to be concatenated end-to-end without padding to the longest sequence in the batch. Per-sequence lengths are supplied via setQueryLengths and setKeyValueLengths. IKVCacheUpdateLayer similarly supports packed updates via setUpdateForm and setUpdateLengths. For more information, refer to the Fused Attention section.
MoE (Mixture of Experts)
- Backend performance improvements for MoE inference: Builds on the MoE inference capability introduced in TensorRT 10.16 with significant backend optimizations that close the performance gap between TensorRT's out-of-the-box MoE inference and specialized external MoE kernels, particularly on Blackwell (SM10x and SM110) hardware. The previous guidance to keep token counts low (seqLen ≤ 16) no longer applies; MoE inference now performs well across a much broader range of token counts. For more information, refer to the MoE (Mixture of Experts) section.
Multi-Device Inference
- Multi-Device Inference is now fully supported: In TensorRT 10.16, Multi-Device Inference was a preview feature that required manually enabling the PreviewFeature::kMULTIDEVICE_RUNTIME_10_16 flag in the builder config. Starting in TensorRT 11.0, this feature is fully supported and the flag is no longer needed. For more details, refer to the Multi-Device Inference section.
- Expanded Distributed Collective Operations: Optimized distributed workloads by introducing new collective operations to the IDistCollectiveLayer, specifically adding AllToAll, Gather, and Scatter.
- Improved NCCL Library Discovery: Implemented an automatic fallback mechanism for loading the NCCL library to increase environment compatibility and deployment flexibility. The runtime now checks for libnccl.so.2 before seamlessly falling back to libnccl.so, preventing initialization failures caused by strictly requiring a single specific filename.
- New Context-Parallel Attention Python Sample: Added a new Python sample (attention-mdtrt) that demonstrates context-parallel attention that splits KV sequences across GPUs. For more information, refer to the Working with Transformers section.
API Enhancements
- Internal Library Path API: Added nvinfer1::setInternalLibraryPath C++ API to set the path for internal builder resource libraries (libnvinfer_builder_resource_*.so) when they are not in the system path. For more information, refer to the Set Internal Library Path API section.
- IAttention causal mask orientation control: Introduced the CausalMaskKind enum and IAttention::setCausalKind / IAttention::getCausalKind APIs to let users specify causal mask alignment (for example, kUPPER_LEFT or kNONE) without providing an explicit mask tensor. This enables clearer and more flexible configuration of causal masking behavior in addAttention.
- API Change Tracking: To view API changes between releases, refer to the TensorRT GitHub repository and use the compare tool.
Open-Source Components
- New TopK V3 plugin for large K values: A new IPluginV3 TopK plugin has been added to the TensorRT OSS components. The plugin ports the AIR TopK kernel from TensorRT-LLM and supports significantly larger K values than the native TensorRT kernel for the ONNX TopK operator. In 11.0.0, the plugin is OSS-only. The core TensorRT builder does not automatically fall back to it for large K values, so you must add the plugin to your network explicitly. Automatic fallback in the core TensorRT path may be added in a subsequent release.
- For changes to TensorRT open-source components, including samples, plugins, and parsers, refer to the TensorRT GitHub Releases page.
Documentation
- Rewritten Best Practices and Benchmarking guide: The Best Practices landing page now frames performance work as a measure-then-optimize feedback loop, and the Performance Benchmarking chapter has been rewritten to cover both ONNX-TRT (trtexec) and Torch-TRT workflows side by side in synchronized tabs. New or expanded coverage includes benchmarking basics, ModelOpt quantization, dynamic shapes, CUDA graphs, real input values, layer information and per-layer runtime, serialized engines and timing caches, built-in TensorRT profiling, and reading the Nsight Systems Timeline View.
Please download the version compatible with your development environment using the links below.
Documentation
TensorRT 11.0.0 GA for x86_64 Architecture
Debian, RPM, and TAR Install Packages for Linux
- TensorRT 11.0.0 GA for Linux x86_64 and CUDA 12.0 to 12.9 TAR Package
- TensorRT 11.0.0 GA for Ubuntu 22.04 and CUDA 12.0 to 12.9 DEB local repo Package
- TensorRT 11.0.0 GA for Ubuntu 24.04 and CUDA 12.0 to 12.9 DEB local repo Package
- TensorRT 11.0.0 GA for Debian 12 and CUDA 12.0 to 12.9 DEB local repo Package
- TensorRT 11.0.0 GA for RedHat / Rocky Linux 8 and CUDA 12.0 to 12.9 RPM local repo Package
- TensorRT 11.0.0 GA for RedHat / Rocky Linux 9 and CUDA 12.0 to 12.9 RPM local repo Package
- TensorRT 11.0.0 GA for Linux x86_64 and CUDA 13.0 to 13.2 TAR Package
- TensorRT 11.0.0 GA for Ubuntu 22.04 and CUDA 13.0 to 13.2 DEB local repo Package
- TensorRT 11.0.0 GA for Ubuntu 24.04 and CUDA 13.0 to 13.2 DEB local repo Package
- TensorRT 11.0.0 GA for Debian 12 and CUDA 13.0 to 13.2 DEB local repo Package
- TensorRT 11.0.0 GA for RedHat / Rocky Linux 8 and CUDA 13.0 to 13.2 RPM local repo Package
- TensorRT 11.0.0 GA for RedHat / Rocky Linux 9 and CUDA 13.0 to 13.2 RPM local repo Package
- TensorRT 11.0.0 GA for RedHat / Rocky Linux 10 and CUDA 13.0 to 13.2 RPM local repo Package
Zip Packages for Windows
- TensorRT 11.0.0 GA for Windows 10, 11, Server 2022 and CUDA 12.0 to 12.9 ZIP Package
- TensorRT 11.0.0 GA for Windows 10, 11, Server 2022 and CUDA 13.0 to 13.2 ZIP Package
TensorRT 11.0.0 GA for ARM SBSA
Debian and TAR Install Packages for Linux
- TensorRT 11.0.0 GA for Linux SBSA and CUDA 13.2 TAR Package
- TensorRT 11.0.0 GA for Ubuntu 24.04 and CUDA 13.2 DEB local repo Package
- TensorRT 11.0.0 GA for Ubuntu 24.04 and CUDA 13.2 DEB cross local repo Package
- TensorRT 11.0.0 GA for Debian 12 and CUDA 13.2 DEB local repo Package
- TensorRT 11.0.0 GA for Debian 12 and CUDA 13.2 DEB cross local repo Package
TensorRT is also available on the following NVIDIA GPU platforms:
- NVIDIA NIM for developing AI-powered enterprise applications and deploying AI models in production
- NVIDIA GPU Cloud (NGC) TensorRT Container for cloud deployment
- NVIDIA Jetpack for Jetson Orin embedded platforms
- NVIDIA DRIVE® Install for NVIDIA DRIVE autonomous driving platform (access requires membership of the NVIDIA Drive Developer Program)
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.