NVIDIA NIM for Developers

NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of accelerated inference microservices that allow organizations to run AI models on NVIDIA GPUs anywhere—in the cloud, data center, workstations, and PCs. Using industry-standard APIs, developers can deploy AI models with NIM using just a few lines of code. NIM containers seamlessly integrate with the Kubernetes (K8s) ecosystem, allowing efficient orchestration and management of containerized AI applications. Accelerate the development of your AI applications today with NIM.

Try APIs

How It Works

NVIDIA NIM helps overcome the challenges of building AI applications, providing developers with industry-standard APIs for building powerful copilots, chatbots, and AI assistants while making it easy for IT and DevOps teams to self-host AI models in their own managed environments. Built on robust foundations, including inference engines like NVIDIA Triton™ Inference Server, TensorRT™, TensorRT-LLM, and PyTorch, NIM is engineered to facilitate seamless AI inferencing at scale.

Watch Video

NVIDIA NIM inference microservices stack diagram

Introductory Blog

Learn about NIM’s architecture, key features, and components.


Access guides, API reference information, and release notes.

Introductory Webinar

Learn key considerations for deploying and scaling generative AI in production using NIM.

Deployment Guide

Get step-by-step instructions for self-hosting NIM on any NVIDIA accelerated infrastructure.

Why Develop With NVIDIA NIM?

Simplify Development

Build AI applications with industry-standard APIs and libraries in popular large language model (LLM) development frameworks that make it easy to integrate AI models into your application.

Take AI Models With You

Maintain security and control of generative AI applications and data with prebuilt, cloud-native microservices that can be deployed on NVIDIA infrastructure anywhere—workstation, data center, or cloud.

Experience Optimized Performance

Get optimized inference engines from NVIDIA and the community, including TensorRT, TensorRT-LLM, Triton Inference Server, and more, that improve AI application performance and efficiency while delivering lower-latency, high-throughput inference.

Use Custom AI Models

Easily customize NIM by deploying models fine-tuned to deliver the best accuracy for your specific use case.

Build for Production

Leverage enterprise-grade software with dedicated feature branches and rigorous validation processes to ensure your applications will be ready for production deployment.


Build RAG Applications With Standard APIs

Get started prototyping your AI application with NIM hosted in the NVIDIA API catalog. Using generative AI examples from GitHub, see how to easily deploy a retrieval-augmented generation (RAG) pipeline for chat Q&A using hosted endpoints. Developers can get 1,000 inference credits free on any of the available models to begin developing their application.

Explore RAG LLM Generative AI Examples

Self-Host AI Models as a Service

Using a single optimized container, you can easily deploy NIM in under five minutes on accelerated NVIDIA GPU systems in the cloud, in the data center, or on workstations and PCs. Follow these simple instructions to deploy a NIM container and build an application using connectors from leading developer tools.

Deploy Generative AI Anywhere With NVIDIA NIM

Get Started With NVIDIA NIM

We provide different options for you to build and deploy optimized AI applications using the latest AI models with NVIDIA NIM.

Decorative image of building AI application with NVIDIA NIM API


Begin building your AI application with NVIDIA-hosted NIM APIs.

Visit the NVIDIA API Catalog
Decorative image of joining NVIDIA Developer Program for free access to NIM


Join the NVIDIA Developer Program to get free access to NIM for research, development, and testing (expected availability July 2024).

Join the NVIDIA Developer Program and Get Notified About NIM Availability
Decorative image of deploying with NVIDIA AI Enterprise


Move from pilot to production with the assurance of security, API stability, and support with NVIDIA AI Enterprise.

Request a Free 90-Day NVIDIA AI Enterprise License

NVIDIA NIM Learning Library

Getting Started Blog

Learn how to use NIM microservices APIs across the most popular generative AI application frameworks like Haystack, LangChain, and LlamaIndex.

Hands-On Lab

Through NVIDIA LaunchPad, explore how to get started with NVIDIA NIM on any infrastructure in just five minutes.


Learn more about high-performance features, applications, architecture, release notes, and more for NVIDIA NIM for LLMs.

More Resources

Ethical AI

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.

Learn about the latest NVIDIA NIM models, applications, and tools.

Sign Up