Deploying a Model for Inference at Production Scale
A lot of love goes into building a machine-learning model. Challenges range from identifying the variables to predict to experimentation finding the best model architecture to sampling the correct training data. But, what good is the model if you can’t access it?
Enter the NVIDIA Triton Inference Server. NVIDIA Triton helps data scientists and system administrators turn the same machines you use to train your models into a web server for model prediction. While a GPU is not required, an NVIDIA Triton Inference Server can take advantage of multiple installed GPUs to quickly process large batches of requests.
To get hands-on practice with a live server, the NVIDIA Deep Learning Institute (DLI) is offering a 4-hour, self-paced course titled Deploying a Model for Inference at Production Scale.
MLOps Overview
NVIDIA Triton was created with Machine Learning Operations, or MLOps, in mind. MLOps is a relatively new field evolved from Developer Operations, or DevOps, to focus on scaling and maintaining machine-learning models in a production environment. NVIDIA Triton is equipped with features such as model versioning for easy rollbacks. It is also compatible with Prometheus to track and manage server metrics such as latency and request count.
Course Information
This course covers an introduction to MLOps coupled with hands-on practice with a live NVIDIA Triton Inference Server.
Learning objectives include:
- Deploying neural networks from a variety of frameworks onto a live NVIDIA Triton Server.
- Measuring GPU usage and other metrics with Prometheus.
- Sending asynchronous requests to maximize throughput.
Upon completion, developers will be able to deploy their own models on an NVIDIA Triton Server.
For additional hands-on training visit the NVIDIA Deep Learning Institute.