After clicking “Watch Now” you will be prompted to login or join.
PyTorch-TensorRT: Accelerating Inference in PyTorch with TensorRT
Josh Park, NVIDIA | Naren Dasan, NVIDIA
GTC 2020
TensorRT is a deep-learning inference optimizer and runtime to optimize networks for GPUs and the NVIDIA Deep Learning Accelerator (DLA). Typically, the procedure to optimize models with TensorRT is to first convert a trained model to an intermediary format, such as ONNX, and then parse the file with a TensorRT parser. This works well for networks using common architectures and common operators; however, with the rapid pace of model development, sometimes a DL framework like Tensorflow has ops that are not supported in TensorRT. One solution is to implement plugins for these ops. Another is to use a tool like TF-TRT, which will convert supportable subgraphs to TensorRT and use Tensorflow implementations for the rest. We'll demonstrate the same ability with PyTorch with our new tool PTH-TRT, as well leveraging the PyTorch API's great composability features to allow users to reuse their TensorRT-compatible networks in larger, more complex ones.