GTC 2020: Optimizing Recommendation System Inference Performance Based on GPU
After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
Optimizing Recommendation System Inference Performance Based on GPU
Xiaowei Shen, Alibaba | Wei Zhang, Alibaba Group
We'll present a GPU-based system to speed up recommendation system inference performance. Neural network-based recommendation models have been widely applied on tracking personalization and recommendation tasks at large internet companies, such as e-commerce and social media companies. Alibaba's recommendation system deploys wide and deep learning models for product recommendation tasks. The more products and users the model needs to rank, the more the feature length and batch size of the models increase. The computation of models is also increased so that traditional model inference implementation on CPU cannot meet the requirement of query-per-second and latency-of-recommendation tasks. With model quantization and graph transformation, we can speed up performance by 3.9x compared with a baseline GPU implementation.