NVIDIA NeMo Retriever

NVIDIA NeMo Retriever is a collection of generative AI microservices that provide world-class information retrieval with high accuracy and maximum data privacy to generate context-aware responses and insights in real time from large corpuses of data. Built with NVIDIA NIM™, developers can flexibly leverage fine-tuned NeMo Retriever microservices in combination with community or custom models—to build AI query engines with scalable document ingestion and advanced retrieval-augmented generation (RAG), connecting applications to varied types of data wherever it resides.

As part of NVIDIA AI Enterprise, NeMo Retriever is designed for production-ready information retrieval pipelines and provides the security, stability, and support required by enterprises running their business on AI.

Try Now
Documentation
Forum


How NeMo Retriever Works

NeMo Retriever provides developers with the components needed to build data ingestion and information retrieval pipelines. The ingestion pipeline extracts structured and unstructured data, including text, charts, documents, and tables, converting it to text and filtering it to avoid duplicate chunks.

The extracted data is fed into a retrieval pipeline. A NeMo Retriever converts the chunks into embeddings and stores them in a vector database, accelerated by NVIDIA cuVS, for enhanced performance and speed of indexing and search.

When a user submits a query, relevant information is retrieved from the vast repository of ingested knowledge. NeMo Retriever embedding NIM embeds the user query, which is used to retrieve the most relevant chunks using vector similarity search from the vector database.

A NeMo Retriever reranking NIM evaluates and reranks the results to ensure the most accurate and useful chunks are used to augment the prompt. With the most pertinent information at hand, the LLM NIM generates a response that’s informed, accurate, and contextually relevant. Various LLM NIM microservices can be used from the NVIDIA API catalog to enable additional capabilities such as synthetic data generation, and using NVIDIA NeMo developers can fine-tune them to better align with specific use-case requirements. 

 A flowchart showing how NVIDIA NeMo Retriever works

NVIDIA NeMo Retriever collection of NIM microservices are used to build optimized ingestion and retrieval pipelines for highly accurate information retrieval at scale.  

Introductory Blog

Understand the function of embedding and reranking modes in information retrieval pipelines, top considerations like cost and stability, and selecting the right NIM microservices to use. 

Read Blog

Introductory Webinar

Learn how to improve the accuracy and scalability of text retrieval for production-ready generative AI pipelines—and deploy where your data resides. 

Watch Webinar

Guide for Embedding

Steps to getting access to state-of-the-art models for enterprise semantic search applications, delivering accurate answers quickly at scale for text retriever APIs for embedding. 

Read Documentation

Guide for Reranking

A getting-started guide for creating robust copilots, chatbots, and AI assistants from start to finish with text Retriever NIM APIs for reranking. 

Read Documentation

World-Class Information Retrieval Performance

NeMo Retriever enables enterprise-scale document ingestion, search, fast retrieval with higher throughput, and better accuracy with fewer incorrect answers.

2X Throughput for Fast Retrieval

Multilingual Text Embedding Model

 A graph showing NeMo Retriever has achieved 2X throughput for fast info retrieval
The test was conducted using the NVIDIA NeMo Retriever embedding NIM microservice (NV-EmbedQA-Mistral7B-v2) along with 1xH100 SXM, 512 passage token length, 64 batch size, and 3 concurrent client requests to achieve 2X throughput.
NIM Off: FP16, P90 latency: ~3.8s
NIM On: FP8, P90 latency: ~1.8s

30% Fewer Incorrect Answers

Text Embedding and Reranking

 A graph showing NeMo Retriever has achieved high accuracy with 30% fewer incorrect answers
The test compared an alternative embedder (BERT-Large) without the benefits of NIM to the NVIDIA NeMo Retriever embedding and reranking NIM microservices (NV-EmbedQA-E5-v5 + NV-RerankQA-Mistral-4B-v3) and achieved 30% fewer incorrect answers (Recall@5).

Ways to Get Started With NVIDIA NeMo Retriever

Use the right tools and technologies to build and deploy generative AI applications that require secure and accurate information retrieval to generate real-time business insights for organizations across every industry. 

Decorative

Try

Experience NeMo Retriever NIM microservices through a UI-based portal for exploring and prototyping with NVIDIA-managed endpoints, available for free through NVIDIA’s API catalog and deployed anywhere.

Try the Retriever NIM Microservices
Decorative icon

Experience

Access NVIDIA-hosted infrastructure and guided hands-on labs that include step-by-step instructions and examples, available for free on NVIDIA Launchpad.

Access Hands-on Labs
Decorative icon representing source code

Build

Jump-start building your AI solutions with NVIDIA Blueprints, customizable reference applications, available on the NVIDIA API catalog.

Try the Blueprint
Decorative icon

Deploy

Get a free license to try
NVIDIA AI Enterprise in production for 90 days using your existing infrastructure.

Request a 90-Day License 

Starter Kits

Start building information retrieval pipelines and generative AI applications for multimodal data ingestion, embedding, reranking, retrieval-augmented generation, and agentic workflows by accessing tutorials, notebooks, blogs, forums, reference code, comprehensive documentation, and more.

Starter Kits by Use Case

Build a Pipeline for Multimodal PDF Data Extraction With RAG

Build an enterprise-scale pipeline that can ingest and extract highly accurate insights contained in text, graphs, charts, and tables within massive volumes of PDF documents.

Build AI Virtual Assistants for Enhanced Customer Service

Build enhanced virtual assistants that are more personalized and secure by leveraging RAG, NeMo Retriever, NIM microservices, and the latest AI agent-building methodologies.

Develop LLM-Powered AI Chatbots With RAG Using NeMo Retriever

Build AI chatbots powered by LLMs that can accurately answer questions about your enterprise data using NeMo Retriever and NIM microservices.

Starter Kits by Stages in the Retriever Pipeline

Ingestion

Rapidly ingest massive volumes of data and extract text, graphs, charts, and tables at the same time for highly accurate retrieval. 

Embedding

Boost text question and answer retrieval performance, providing high-quality embeddings for many downstream NLP tasks. 

Reranking

Enhance the retrieval performance further with a fine-tuned reranker, finding the most relevant passages to provide as context when querying an LLM. 


NVIDIA NeMo Retriever Learning Library


More Resources

Decorative image representing forums

Explore the Community

Get Training and Certification

Accelerate Your Startup


Ethical AI

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.

Stay up to date on the latest generative AI news from NVIDIA.

Sign Up