NVIDIA TensorRT

High performance deep learning inference for production deployment

NVIDIA TensorRT™ is a high performance neural network inference engine for production deployment of deep learning applications. TensorRT can be used to rapidly optimize, validate and deploy trained neural network for inference to hyperscale data centers, embedded, or automotive product platforms.

Developers can use TensorRT to deliver fast inference using INT8 or FP16 optimized precision that significantly reduces latency, as demanded by real-time services such as streaming video categorization on the cloud or object detection and segmentation on embedded and automotive platforms. With TensorRT developers can focus on developing novel AI-powered applications rather than performance tuning for inference deployment. TensorRT runtime ensures optimal inference performance that can meet the needs of even the most demanding throughput requirements.

(Click to Zoom)


TensorRT 1.0 (previously known as GIE) is available for download as part of the NVIDIA Deep Learning SDK. Sign up for the release candidate program below to learn more and download.

APPLY NOW

What's New in TensorRT 2

TensorRT uses INT8 optimized precision to deliver 3x more throughput, using 61% less memory on applications that rely on high accuracy inference.

(Click to Zoom)

(Click to Zoom)

(Click to Zoom)


TensorRT 2.0 with INT8 support is now available for pre-release testing through the TensorRT 2.0 access program. Apply Today to learn more and sign up for the the program.


Key Features

  • Generate optimized, deployment-ready models for inference
  • Optimize and deploy widely used neural network layers such as convolutional, fully connected, LRN, pooling, activations, softmax, concat and deconvolution layers
  • Support for Caffe prototxt network descriptor files
  • Deploy neural networks in full (FP32) or reduced precision (INT8, FP16)
  • Define and implement unique functionality using the custom layer API

Learn More

For a technical overview of TensorRT 1.0 (previously known as GIE) with instructions on how to use it for production deployment, please refer to the following Parallel Forall blog post: Production Deep Learning with NVIDIA GPU Inference Engine