NVIDIA TensorRT for RTX
TensorRT for RTX brings optimized AI inference and cutting-edge acceleration to developers using NVIDIA RTX GPUs. Offering peak performance for PC AI workloads such as CNN, Transformers, Speech & Diffusion models, and engineered to be lean at under 200MB, TensorRT for RTX delivers fast engine build times, typically within 15 to 30s. Engines built with TensorRT for RTX are portable across GPUs and OS – allowing build once, deploy anywhere workflows.
TensorRT for RTX supports NVIDIA GeForce and RTX GPUs from the Turing family all the way to Blackwell and beyond. SDKs can be available for both Windows and Linux development.
Please review TensorRT for RTX documentation for more information and visit our GitHub for samples and demos.
Available Versions
TensorRT for RTX 1.5 CUDA 12.9 (Linux x86_64)
TensorRT for RTX 1.5 CUDA 13.2 (Linux x86_64)
TensorRT for RTX 1.5 CUDA 13.2 (Linux aarch64)
TensorRT for RTX 1.5 CUDA 12.9 (Windows)
TensorRT for RTX 1.5 CUDA 13.2 (Windows)
Notable changes in this TensorRT for RTX release:
- DGX Spark / Linux SBSA (Experimental)
- New experimental build for NVIDIA DGX Spark (NVIDIA GB10, compute capability 12.1) and ARM64 Linux SBSA platforms (Ubuntu 22.04 / 24.04).
- CUDA 13.3 Support
- Compatible with NVIDIA CUDA 13.3, with continued support for CUDA 12.9 Update 1.
- Qwen3.5 Support
- Qwen3.5 dense models are supported through Windows ML with the TensorRT-RTX execution provider.
- Operator Support
- Added support for the RoiAlign ONNX operator.
- GPU Latency Optimizations
- Faster GEMV kernels for dynamic input shapes, reduced CPU overhead between kernel launches, expanded kernel fusion coverage for dynamic shapes, and improved just-in-time kernel generation for additional convolution variants and runtime fusion patterns. Convolution performance was also improved on the NVIDIA GB10 Architecture.
- Stability and Accuracy Fixes
- Resolved YOLO ONNX model builds on Turing, fixed FP16 dynamic-shape execution-context errors affecting models such as Stable Diffusion XL UNet and DaVinci Resolve SpeedWarp, fixed a dynamic-shape accuracy regression, enabled BF16 depthwise convolutions and deconvolutions, and enabled 3D deconvolutions with groups and padding.
- For more details, please refer to the full Release Notes.
Available Versions
TensorRT for RTX 1.1 (Windows)
TensorRT for RTX 1.1 (Linux)
Notable changes in this TensorRT-RTX release:
- Added the IRuntime::getEngineValidity() API to programmatically and efficiently check whether a TensorRT-RTX engine file is valid on the current system or needs to be rebuilt due to incompatibilities in the software version, compute capability, and so on.
- Compilation time has been greatly improved, particularly for models with many memory-bound kernels. On average a 1.5x improvement is observed across a variety of model architectures.
Available Versions
TensorRT for RTX 1.0 (Windows)
TensorRT for RTX 1.0 (Linux)
This TensorRT-RTX release includes the following key features and enhancements when compared to NVIDIA TensorRT.
- Reduced binary size of under 200 MB for improved download speed and disk footprint when included in consumer applications.
- Splitting optimization into a hardware-agnostic "ahead-of-time" (AOT) phase and a hardware-specific "just-in-time" (JIT) phase in order to improve user experience. Completes end-2-end engine compilation in under 30s
- Improved adaptivity to real-system resources for applications where AI features run in the background to for eg: graphics
- Focused improvement on portability and deployment while still delivering industry-leading performance.
Ethical AI
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.