High performance deep learning inference for production deployment

NVIDIA TensorRT™ is a high performance neural network inference engine for production deployment of deep learning applications. TensorRT can be used to rapidly optimize, validate and deploy trained neural network for inference to hyperscale data centers, embedded, or automotive product platforms.

Developers can use TensorRT to deliver fast inference using INT8 or FP16 optimized precision that significantly reduces latency, as demanded by real-time services such as streaming video categorization on the cloud or object detection and segmentation on embedded and automotive platforms. With TensorRT developers can focus on developing novel AI-powered applications rather than performance tuning for inference deployment. TensorRT runtime ensures optimal inference performance that can meet the needs of even the most demanding throughput requirements.

TensorRT is available as a free download to the members of the NVIDIA Developer Program. If you are not already a member, clicking “Download” will ask you join the program.

(Click to Zoom)


(Click to Zoom)

(Click to Zoom)

TensorRT 2 - Early Access

Deploy faster, more responsive deep learning applications with TensorRT to deliver improved user experience at reduced costs. With FP16 and INT8 optimized precision, deliver up to 3x more throughput, using 61% less memory on applications that rely on high accuracy inference.

TensorRT 2 with INT8 support is now available for pre-release testing through the TensorRT 2 Early Access program. To test this version and provide feedback, please use the "Join now" button below to learn more and apply for the program.

Join Now

(Click to Zoom)

(Click to Zoom)

(Click to Zoom)

Key Features

  • Generate optimized, deployment-ready models for inference
  • Optimize and deploy widely used neural network layers such as convolutional, fully connected, LRN, pooling, activations, softmax, concat and deconvolution layers
  • Support for Caffe prototxt network descriptor files
  • Deploy neural networks in full (FP32) or reduced precision (INT8, FP16)
  • Define and implement unique functionality using the custom layer API

Learn More

For a technical overview of TensorRT with instructions on how to use it for production deployment, please refer to the following Parallel ForAll blog post: Deploying Deep Neural Networks with NVIDIA TensorRT