After clicking “Watch Now” you will be prompted to login or join.
Deploying your Models to GPU with ONNX Runtime for Inferencing in Cloud and Edge Endpoints
Manash Goswami , Microsoft | Kundana Palagiri, Microsoft
GTC 2020
Models are mostly trained targeting high-powered data centers for deployment — not low-power, low-bandwidth, compute-constrained edge devices. There is a need to accelerate the execution of the ML algorithm with GPU to speed up performance. GPUs are used in the cloud, and now increasingly on the edge. And the number of edge devices that need ML model execution is exploding, with more than 5 billion total endpoints expected by 2022. ONNX Runtime is the inference engine for accelerating your ONNX models on GPU across cloud and edge. We'll discuss how to build your AI application using AML Notebooks and Visual Studio, use prebuild/custom containers, and, with ONNX Runtime, run the same application code across cloud GPU and edge devices like the Azure Stack Edge with T4 and low-powered device like Jetson Nano. We'll also demonstrate distribution strategies for those models, using hosted services like Azure IoT Edge. You'll take away an understanding of the various tradeoffs for moving ML to the edge, and how to optimize for a variety of specific scenarios.