NVIDIA NeMo Retriever
NVIDIA NeMo™ Retriever is a collection of generative AI microservices that provide world-class information retrieval with high accuracy and maximum data privacy to generate context-aware responses and insights in real time from large corpuses of data. Built with NVIDIA NIM™, developers can flexibly leverage fine-tuned NeMo Retriever microservices in combination with community or custom models—to build AI query engines with scalable document ingestion and advanced retrieval-augmented generation (RAG), connecting applications to varied types of data wherever it resides.
As part of NVIDIA AI Enterprise, NeMo Retriever is designed for production-ready information retrieval pipelines and provides the security, stability, and support required by enterprises running their business on AI.
How NeMo Retriever Works
NeMo Retriever provides developers with the components needed to build data ingestion and information retrieval pipelines. The ingestion pipeline extracts structured and unstructured data, including text, charts, documents, and tables, converting it to text and filtering it to avoid duplicate chunks.
The extracted data is fed into a retrieval pipeline. A NeMo Retriever converts the chunks into embeddings and stores them in a vector database, accelerated by NVIDIA cuVS, for enhanced performance and speed of indexing and search.
When a user submits a query, relevant information is retrieved from the vast repository of ingested knowledge. NeMo Retriever embedding NIM embeds the user query, which is used to retrieve the most relevant chunks using vector similarity search from the vector database.
A NeMo Retriever reranking NIM evaluates and reranks the results to ensure the most accurate and useful chunks are used to augment the prompt. With the most pertinent information at hand, the LLM NIM generates a response that’s informed, accurate, and contextually relevant. Various LLM NIM microservices can be used from the NVIDIA API catalog to enable additional capabilities such as synthetic data generation, and using NVIDIA NeMo developers can fine-tune them to better align with specific use-case requirements.
NVIDIA NeMo Retriever collection of NIM microservices are used to build optimized ingestion and retrieval pipelines for highly accurate information retrieval at scale.
Introductory Blog
Understand the function of embedding and reranking modes in information retrieval pipelines, top considerations like cost and stability, and selecting the right NIM microservices to use.
Introductory Webinar
Learn how to improve the accuracy and scalability of text retrieval for production-ready generative AI pipelines—and deploy where your data resides.
Guide for Embedding
Steps to getting access to state-of-the-art models for enterprise semantic search applications, delivering accurate answers quickly at scale for text retriever APIs for embedding.
Guide for Reranking
A getting-started guide for creating robust copilots, chatbots, and AI assistants from start to finish with text Retriever NIM APIs for reranking.
World-Class Information Retrieval Performance
NeMo Retriever enables enterprise-scale document ingestion, search, fast retrieval with higher throughput, and better accuracy with fewer incorrect answers.
2X Throughput for Fast Retrieval
Multilingual Text Embedding Model
NIM Off: FP16, P90 latency: ~3.8s
NIM On: FP8, P90 latency: ~1.8s
30% Fewer Incorrect Answers
Text Embedding and Reranking
Ways to Get Started With NVIDIA NeMo Retriever
Use the right tools and technologies to build and deploy generative AI applications that require secure and accurate information retrieval to generate real-time business insights for organizations across every industry.
Try
Experience NeMo Retriever NIM microservices through a UI-based portal for exploring and prototyping with NVIDIA-managed endpoints, available for free through NVIDIA’s API catalog and deployed anywhere.
Experience
Access NVIDIA-hosted infrastructure and guided hands-on labs that include step-by-step instructions and examples, available for free on NVIDIA Launchpad.
Build
Jump-start building your AI solutions with NVIDIA Blueprints, customizable reference applications, available on the NVIDIA API catalog.
Deploy
Get a free license to try
NVIDIA AI Enterprise in production for 90 days using your existing infrastructure.
Starter Kits
Start building information retrieval pipelines and generative AI applications for multimodal data ingestion, embedding, reranking, retrieval-augmented generation, and agentic workflows by accessing tutorials, notebooks, blogs, forums, reference code, comprehensive documentation, and more.
Starter Kits by Use Case
Build a Pipeline for Multimodal PDF Data Extraction With RAG
Build an enterprise-scale pipeline that can ingest and extract highly accurate insights contained in text, graphs, charts, and tables within massive volumes of PDF documents.
Build AI Virtual Assistants for Enhanced Customer Service
Build enhanced virtual assistants that are more personalized and secure by leveraging RAG, NeMo Retriever, NIM microservices, and the latest AI agent-building methodologies.
Develop LLM-Powered AI Chatbots With RAG Using NeMo Retriever
Build AI chatbots powered by LLMs that can accurately answer questions about your enterprise data using NeMo Retriever and NIM microservices.
Starter Kits by Stages in the Retriever Pipeline
Ingestion
Rapidly ingest massive volumes of data and extract text, graphs, charts, and tables at the same time for highly accurate retrieval.
Embedding
Boost text question and answer retrieval performance, providing high-quality embeddings for many downstream NLP tasks.
Reranking
Enhance the retrieval performance further with a fine-tuned reranker, finding the most relevant passages to provide as context when querying an LLM.
NVIDIA NeMo Retriever Learning Library
More Resources
Ethical AI
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.
Stay up to date on the latest generative AI news from NVIDIA.