Generative AI / LLMs

Access to NVIDIA NIM Now Available Free to Developer Program Members

An image showing NVIDIA NIM.

The ability to use simple APIs to integrate pretrained AI foundation models into products and experiences has significantly increased developer usage of LLM endpoints and application development frameworks. NVIDIA NIM enables developers and engineering teams to rapidly deploy their own AI model endpoints for the secure development of accelerated generative AI applications using popular development tools and frameworks.

Developers said they want easier access to NIM for development purposes, so we’re excited to provide free access to downloadable NIM microservices for development, testing, and research to over 5M NVIDIA Developer Program members. Members of the program are provided comprehensive resources, training, tools, and a community of experts that help build accelerated applications and solutions.

In this post, we cover a brief technical overview of NIM microservices, highlight some microservices available for download and self-hosted deployment, and provide hands-on resources to get started.

What are NIM microservices?

NIM provides containers used to self-host GPU-accelerated microservices for pretrained and customized AI models across clouds, data centers, and workstations. They can be deployed with a single command, and automatically expose industry-standard APIs for quick integration into applications, development frameworks, and workflows. One example is the OpenAI API specification for large language model (LLM)-based NIM microservices.

Optimized inference engines built with NVIDIA TensorRT and NVIDIA TensorRT-LLM deliver low response latency and high throughput. At runtime, NIM microservices select the optimal inference engine for each combination of foundation model, GPU, and system. NIM containers also provide standard observability data feeds and built-in support for autoscaling with Kubernetes on NVIDIA GPUs. For more information about NIM features and architecture, see the NVIDIA NIM for LLMs documentation.

Download NIM microservices for any use case

While anyone can sign up to the NVIDIA API Catalog for free credits to access models through NVIDIA-hosted NIM endpoints, members of the NVIDIA Developer program get free access to the latest downloadable NIM microservices, including Meta’s Llama 3.1 8B, Mistral AI’s compact Mistral 7B Instruct, and many more.

Developer program members can use NIM microservices on up to two nodes, or 16 GPUs. When ready to use NIM in production, organizations can sign up for a free 90-day NVIDIA AI Enterprise license. For more information, see the FAQ.

Get started with downloadable NIM microservices

In the NVIDIA API Catalog, select a microservice, and choose Build with this NIM to download your NIM microservice and get an API key for the container.

If you’re not yet a program member, you will get the opportunity to join – just look for the Developer Program option. For more information, see Getting Started and A Simple Guide to Deploying Generative AI with NVIDIA NIM.

Video 1. How to Deploy NVIDIA NIM in 5 Minutes

If you’d like to get hands-on experience with a NIM microservice on managed infrastructure with simple deployment, try the NVIDIA Brev Launchable using your NVIDIA API key to quickly provision a GPU, download the Llama 3.1 NIM microservice, and interact with it through a Jupyter notebook or a set of endpoints. Hosted NIM microservices are also available on Hugging Face. Both hosted solutions are priced at an hourly rate.

Video 2. Fine-Tune Llama 3.1 and Deploy Using NVIDIA NIM Directly from Your Laptop

For more information, see the following resources:

To engage with NVIDIA and the NIM microservices community, see the NVIDIA NIM developer forum. We look forward to hearing from you and can’t wait to see what you build!

Related resources

Discuss (6)

Tags