After clicking “Watch Now” you will be prompted to login or join.
Optimizing TensorRt Conversion for Real-Time Inference On Autonomous Vehicles
Dheeraj Peri, NVIDIA | Josh Park, NVIDIA | Zejia Zheng, Zoox | Jeff Pyke, Zoox
GTC 2020
TensorRt optimizes neural-network computation for deployment on GPU, but not all operations are supported. Reduced precision inference speeds up computation, but can cause regressions in accuracy. We'll introduce Zoox TensorRt conversion pipeline that addresses these problems. TensorRt compatibility checks are involved at the early stages of neural-network training to ensure that incompatible ops are discovered before wasting time and resources on full-scale training. Inference accuracy checks can be invoked at each layer to identify operations not friendly to reduced-precision computation. Detailed profiling reveals unnecessary computations that aren't optimized inside TensorRt, but can be optimized by simple code changes during graph construction. With this pipeline, we've successfully provided TensorRt conversion support to neural networks performing various perception tasks on the Zoox autonomous driving platform.