Tune in on January 16th @ 9am for a live recap and Q&A of NVIDIA’s latest developer announcements at CES 2025.   Learn More

NVIDIA NIM for Developers

NVIDIA NIM™ provides containers to self-host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, RTX™ AI PCs and workstations. NIM microservices expose industry-standard APIs for simple integration into AI applications, development frameworks, and workflows. Built on pre-optimized inference engines from NVIDIA and the community, including NVIDIA® TensorRT™ and TensorRT-LLM, NIM microservices optimize response latency and throughput for each combination of foundation model and GPU.

Try NVIDIA-Hosted APIsGet Started With NIM


How It Works

NVIDIA NIM simplifies the journey from experimentation to deploying AI applications by providing enthusiasts, developers, and AI builders with pre-optimized models and industry-standard APIs for building powerful AI agents, co-pilots, chatbots, and assistants. Built on robust foundations, including inference engines like TensorRT, TensorRT-LLM, and PyTorch, NIM is engineered to facilitate seamless AI inferencing for the latest AI foundation models on NVIDIA GPUs from cloud or datacenter to PC.

Watch Video

NVIDIA NIM inference microservices stack diagram

Introductory Blog

Learn about NIM architecture, key features, and components.

Documentation

Access guides, reference information, and release notes for running NIM on your infrastructure.

Introductory Video

Learn how to deploy NIM on your infrastructure using a single command.

Deployment Guide

Get step-by-step instructions for self-hosting NIM on any NVIDIA accelerated infrastructure.


Build With NVIDIA NIM

Optimized Model Performance

Improve AI application performance and efficiency with accelerated engines from NVIDIA and the community, including TensorRT, TensorRT-LLM, and more—prebuilt and optimized for low-latency, high-throughput inferencing on specific NVIDIA GPU systems.

Run AI Models Anywhere

Maintain security and control of applications and data with prebuilt microservices that can be deployed on NVIDIA GPUs anywhere—from RTX AI PCs, workstations, data centers, or cloud. Download NIM inference microservices for self-hosted deployment, or take advantage of dedicated endpoints on Hugging Face to spin up instances in your preferred cloud.

Customize AI Models for Your Use Case

Improve accuracy for specific use cases by deploying NIM inference microservices for models fine-tuned with your own data.

Maximize Operationalization and Scale

Get detailed observability metrics for dashboarding, and access Helm charts and guides for scaling NIM on Kubernetes.


NVIDIA NIM Examples and Blueprints

Explore Generative AI Applications Using RAG, Agents, and More With Standard APIs

Get started prototyping your AI application with NIM, hosted in the NVIDIA API catalog. Using generative AI examples from GitHub, see how easy it is to deploy a retrieval-augmented generation (RAG) pipeline for chat Q&A using hosted endpoints, agentic AI workflows, vision AI NIM, and more. Developers can get 1,000 inference credits free on any of the available models to begin developing their application or download models to their own cloud or PC.

Explore RAG LLM Generative AI Examples

Jump-Start Development With Blueprints

NVIDIA Blueprints are reference workflows for canonical generative AI use cases. Enterprises can build and operationalize custom AI applications — creating data-driven AI flywheels — using Blueprints along with NIM microservices and NeMo framework, all part of the NVIDIA AI Enterprise Platform. Blueprints also include partner microservices, one or more AI agents, reference code, customization documentation and a Helm chart for deployment.

Explore NVIDIA Blueprints

Deploy NIM on Cloud via Hugging Face

Simplify and accelerate the deployment of generative AI models on Hugging Face with NIM. With just a few clicks, deploy optimized models like Llama 3 on preferred cloud platforms.

Deploy NIM on Hugging Face

Get Started With NVIDIA NIM

Explore different options for experimenting, building, and deploying optimized AI applications using the latest models with NVIDIA NIM.

Decorative image of building AI application with NVIDIA NIM API

Try

Begin building your AI application with NVIDIA-hosted NIM APIs.

Visit the NVIDIA API Catalog
Decorative image of joining NVIDIA Developer Program for free access to NIM

Build

Get a head start on development with sample applications built with NIM and partner microservices. NVIDIA Blueprints can be deployed in one click with NVIDIA Launchables, downloaded for local deployments on PCs and workstations, or for development in your datacenter or private cloud.

Explore NVIDIA Blueprints
Decorative image of deploying with NVIDIA AI Enterprise

Deploy

Deploy at scale for testing and development through the NVIDIA Developer Program, or move from pilot to production with the assurance of security, API stability, and support with NVIDIA AI Enterprise.

Run NVIDIA NIM anywhere

NVIDIA NIM Learning Library


More Resources

 Decorative image representing Developer Community

Community

Decorative image representing training and certification

Training and Certification

Decorative image representing Inception for Startups

Inception for Startups

Decorative image representing Inception for Startups

Tech Blogs


Ethical AI

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.

Learn about the latest NVIDIA NIM models, applications, and tools.

Sign Up