Generative AI is revolutionizing how organizations across all industries are leveraging data to increase productivity, advance personalized customer engagement, and foster innovation. Given its tremendous value, enterprises are looking for tools and expertise that help them integrate this new technology into their business operations and strategies effectively and reliably.
NVIDIA and Microsoft are working together to provide enterprises with a comprehensive solution for building, optimizing, and deploying AI applications, including generative AI, using NVIDIA AI on Azure Machine Learning (Azure ML).
This week at Microsoft Ignite, NVIDIA and Microsoft announced two additional milestones, bringing new capabilities to Azure ML for managing production AI and developing generative AI applications.
- NVIDIA NeMo, a framework for building and customizing generative AI models, and NVIDIA AI Foundation Models, including the new NVIDIA Nemotron-3 8B family of models, are available in the Azure Machine Learning Model Catalog.
- NVIDIA Triton Inference Server, which scales AI in production, is generally available with Azure ML-managed endpoints.
In June, we published a post explaining the NVIDIA AI Enterprise software integration with Azure Machine Learning, and how to get started. This post provides updates to the progress made by the NVIDIA and Azure teams, explains the benefits of the two new integrations, and steps for accessing them.
NeMo Framework integration in Azure Machine Learning Model Catalog
LLMs are gaining significant attention due to their ability to perform a variety of tasks, such as text summarization, language translation, and text generation. Open-source and proprietary LLMs available as model weights or APIs are pretrained on a large corpus of data using different generative AI frameworks.
Custom LLMs, tailored for domain-specific insights using generative AI frameworks, are also finding increased traction in the enterprise domain.
NeMo is an end-to-end, cloud-native enterprise framework for developers to build, customize, and deploy generative AI models with billions of parameters. It includes training and inferencing frameworks, guard-railing toolkits, data curation tools, and pretrained models, offering enterprises an easy, cost-effective, and fast way to adopt generative AI.
For a secure, optimized full-stack solution designed to accelerate enterprises with support, security, and API stability, NeMo is available as part of NVIDIA AI Enterprise. Azure ML customers can now customize, optimize, and deploy through the user interface in a no-code flow. Customers can also get support directly from NVIDIA on their generative AI projects, including best practices for performance and accuracy.
With the availability of the NVIDIA Nemotron-3 8B family of foundational models and NeMo Framework within the Azure Machine Learning Model Catalog, users can now access, customize, and deploy these models out of the box. The framework offers a choice of several customization techniques and is optimized for at-scale inference of models for language and image applications.
Triton Inference Server Integration in Azure ML-managed endpoint
Triton Inference Server is a multi-framework, open-source software that optimizes inference for multiple query types, such as real-time, batch, and streaming. It also supports model ensembles and is included with NVIDIA AI Enterprise. Triton is compatible with various machine learning frameworks, such as TensorFlow, ONNX Runtime, PyTorch, and NVIDIA TensorRT. It can be used for CPU and GPU workloads.
Triton is available on Azure ML, which provides dynamic batching, concurrent execution, and optimal model configuration. Additionally, it offers enterprise-grade security, manageability, and API stability through NVIDIA AI Enterprise within Azure ML.
Azure ML-managed endpoints make it easy for enterprises to monitor, deploy, and scale AI models, reducing the complexity of setting up and managing your own AI infrastructure.
The GA release is based on production branches, exclusively available with NVIDIA AI Enterprise. Production branches provide stability and security for applications built on NVIDIA AI, offering 9 months of support, API stability, and monthly fixes for software vulnerabilities. Learn more about production branches.
Get started with Triton Inference Server
Deploying your model in Triton Inference Server on an Azure ML-managed endpoint is simple. Watch the following video and follow the 5 steps listed for guidance.
- In Azure Machine Learning, go to Models and register your model in Triton format. Confirm that the type is “Triton.”
- Go to Endpoints and hit the “Create” button to create a real-time online endpoint. Select “Triton” server for deployment.
- Configure your deployment parameters and hit the “Next” button. In the Environment section, the environment and scoring scripts are preselected. Go ahead and hit the “Next” button.
- Confirm the model and environment and click the “Create” button to deploy for model inference.
- Review the test page.
Get started with NVIDIA on Azure Machine Learning
Combining NVIDIA AI Enterprise and Azure Machine Learning creates powerful GPU-accelerated computing and a comprehensive cloud-based machine learning platform, enabling businesses to develop and deploy AI models more efficiently. Enterprises can take advantage of cloud resources and the performance benefits of NVIDIA GPUs and software with this synergy.
Visit NVIDIA at Microsoft Ignite to learn more about the latest innovations and collaborations propelling AI and cloud computing into the future.
Gain insights into groundbreaking NVIDIA solutions by participating in our sponsor sessions or stop by our EMU demo space #311. Check out the NVIDIA showcase page for more information.
Ready to get started now? Check out NVIDIA AI Foundation Models and NeMo Framework in the Azure Machine Learning Model Catalog and NVIDIA Triton Inference Server on Azure ML endpoints.