GTC 2020: PyTorch-TensorRT: Accelerating Inference in PyTorch with TensorRT

After clicking “Watch Now” you will be prompted to login or join.

Click “Watch Now” to login or join the NVIDIA Developer Program.

WATCH NOW

PyTorch-TensorRT: Accelerating Inference in PyTorch with TensorRT

Josh Park, NVIDIA | Naren Dasan, NVIDIA

GTC 2020

TensorRT is a deep-learning inference optimizer and runtime to optimize networks for GPUs and the NVIDIA Deep Learning Accelerator (DLA). Typically, the procedure to optimize models with TensorRT is to first convert a trained model to an intermediary format, such as ONNX, and then parse the file with a TensorRT parser. This works well for networks using common architectures and common operators; however, with the rapid pace of model development, sometimes a DL framework like Tensorflow has ops that are not supported in TensorRT. One solution is to implement plugins for these ops. Another is to use a tool like TF-TRT, which will convert supportable subgraphs to TensorRT and use Tensorflow implementations for the rest. We'll demonstrate the same ability with PyTorch with our new tool PTH-TRT, as well leveraging the PyTorch API's great composability features to allow users to reuse their TensorRT-compatible networks in larger, more complex ones.

View More GTC 2020 Content