After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
Wide and Deep Recommender Inference on GPU
Alec Gunny , NVIDIA | Chirayu Garg, NVIDIA
We'll discuss using GPUs to accelerate so-called "wide and deep" models in the recommendation inference setting. Machine learning-powered recommender systems permeate modern online platforms. Wide and deep models have become a popular choice for recommendation problems due to their high expressiveness compared to more traditional machine learning models, and the ease with which they can be trained and deployed using Tensorflow. We'll demonstrate simple APIs to convert trained canned Tensorflow estimators to TensorRT executable engines and deploy them for inference using NVIDIA's TensorRT Inference Server. The generated TensorRT engines can also be configured to enable reduced-precision computation that leverages tensor cores in NVIDIA GPUs. Finally, we'll show how to integrate these served models into an optimized inference pipeline, exploiting shared request-level features across batches of items to minimize network traffic and fully leverage GPU acceleration.