NVIDIA TensorRT is a high-performance deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. You can import trained models from every deep learning framework into TensorRT, and easily create highly efficient inference engines that can be incorporated into larger applications and services. This video demonstrates the steps

The post Video Tutorial: Accelerating Inference Performance of Recommendation Systems with TensorRT appeared first on NVIDIA Developer News Center.