NVIDIA TensorRT is a high-performance deep learning inference library for production environments. Power efficiency and speed of response are two key metrics for deployed deep learning applications, because they directly affect the user experience and the cost of the service provided. Tensor RT automatically optimizes trained neural networks for run-time performance, delivering up to 16x higher energy