NVIDIA NeMo Retriever

NVIDIA NeMo™ Retriever is a collection of industry-leading models delivering 50% better accuracy, 15x faster multimodal PDF extraction, and 35x better storage efficiency, enabling enterprises to build retrieval-augmented generation (RAG) pipelines that provide real-time business insights. NeMo Retriever ensures data privacy and seamlessly connects to proprietary data wherever it resides, empowering secure, enterprise-grade retrieval. NeMo Retriever serves as a core component for NVIDIA AI-Q—a blueprint for building intelligent AI agents—and the NVIDIA RAG blueprint, enabling access to knowledge from enterprise AI data platforms. It provides a reliable foundation for scalable, production-ready retrieval pipelines supporting advanced AI applications.

NeMo Retriever microservices set a new standard for enterprise RAG applications, leading the industry with first-place performance across three top visual document retrieval leaderboards (ViDoRe V1, ViDoRe V2, and MTEB VisualDocumentRetrieval).

Try Now
Forum

Documentation

Build world-class information retrieval pipelines and AI query engines with scalable data extraction and high-accuracy embedding and reranking.

Ingestion

Rapidly ingest massive volumes of data and extract text, graphs, charts, and tables at the same time for highly accurate retrieval.

Embedding

Boost text question-and-answer retrieval performance, providing high-quality embeddings for many downstream natural language processing (NLP) tasks.

Reranking

Enhance retrieval performance further with a fine-tuned reranking model, finding the most relevant passages to provide as context when querying a large language model (LLM).

How NVIDIA NeMo Retriever Works

NeMo Retriever provides components for building data extraction and information retrieval pipelines. The pipeline extracts structured and unstructured data (ex. text, charts, tables), converts it to text, and filters out duplicates. A NeMo Retriever embedding NIM converts the chunks into embeddings and stores them in a vector database, accelerated by NVIDIA cuVS, for enhanced performance and speed of indexing and search.

NeMo Retriever parse, a VLM-based OCR microservice for text and table extraction, preserves semantic structure, transcribes document images into text in reading order, classifies content types, and outputs structured markdown to retain spatial layout and formatting.

When a query is submitted, the system retrieves relevant information using vector similarity search, and then a NeMo Retriever reranking NIM reranks the results for accuracy. With the most pertinent information, an LLM NIM generates a response that’s informed, accurate, and contextually relevant. You can use various LLM NIM microservices from the NVIDIA API catalog to enable additional capabilities, such as synthetic data generation.

Introductory Resources

Learn more about building efficient information-retrieval pipelines with NeMo Retriever.

Introductory Blog

Understand the function of embedding and reranking models in information retrieval pipelines, top considerations, and more.

Read Blog

Introductory Webinar

Improve the accuracy and scalability of text retrieval for production-ready generative AI pipelines and deploy at scale.

Watch Now

AI Blueprint for RAG

Learn best practices for connecting AI apps to enterprise data using industry-leading embedding and reranking models.

Try the
Blueprint

Introductory GTC Session

Learn about the latest models, tools, and techniques for creating agentic and RAG pipelines for multimodal data ingestion, extraction, and retrieval.

Watch Session

World-Class Information-Retrieval Performance

NeMo Retriever microservices accelerate multimodal document extraction and real-time retrieval with lower RAG costs and higher accuracy. They support reliable, multilingual, and cross-lingual retrieval, and optimize storage, performance, and adaptability for data platforms – enabling efficient vector database expansion.

50% Fewer Incorrect Answers

NeMo Retriever Multimodal Extraction Recall@5 Accuracy

A graph showing NeMo Retriever has achieved 2X throughput for fast info retrieval

Evaluated on publicly available dataset of PDFs consisting of text, charts, tables, and infographics. NeMo Retriever Extraction On: nemoretriever-page-elements-v2, nemoretriever-table-structure-v1, nemoretriever-graphic-elements-v1, paddle-ocr
compared with NeMo Retriever Off: open-source alternative: HW - 1xH100

3X Higher Embedding Throughput

NeMo Retriever Llama 3.2 Multilingual Text Embedding

A graph showing NeMo Retriever has achieved high accuracy with 30% fewer incorrect answers

This test was conducted with the following requirements: 1xH100 SXM; passage token length: 512, batch size: 64, concurrent client requests: 5; OSS Alternative: FP16 compared to the NeMo Retriever lama-3.2-nv-embedqa-1b-v2, NIM: FP8

15X Higher Multimodal Data Extraction Throughput

NeMo Retriever Extraction NIM Microservices

A graph showing NeMo Retriever embedding model is a leader on the Massive Text Embedding Benchmark (MTEB) leaderboard

Pages per second, evaluated on publicly available dataset of PDFs consisting of text, charts, and tables, with NeMo Retriever extraction NIM microservices: nv-yolox-structured-image-v1, nemoretriever-page-elements-v1, nemoretriever-graphic-elements-v1, nemoretriever-table-structure-v1, PaddleOCR, nv-llama3.2-embedqa-1b-v2 compared to an open-source alternative; HW - 1xH100

35x Improved Data Storage Efficiency

Multilingual, Long-Context, Text Embedding NIM Microservice

A graph showing NeMo Retriever embedding model, llama-3.2-nv-embedqa-1b-v2.

Tested with the latest NeMo Retriever embedding model, llama-3.2-nv-embedqa-1b-v2, this shows the impact on vector storage volume with long-context support, dynamic embeddings, and efficient storage for high-performance, scalable data processing. In the chart above, DIM=dimensions.

Ways to Get Started With NVIDIA NeMo Retriever

Use the right tools and technologies to build and deploy generative AI applications that require secure and accurate information retrieval to generate real-time business insights for organizations across every industry.

Try

Experience NeMo Retriever NIM microservices through a UI-based portal for exploring and prototyping with NVIDIA-managed endpoints, available for free through NVIDIA’s API catalog and deployed anywhere.

Try NeMo Retriever Microservices

Decorative icon representing source code

Build

Jump-start building your AI solutions with NVIDIA Blueprints, customizable reference applications, available on the NVIDIA API catalog.

Try the Blueprint

Deploy

Get a free license to try
NVIDIA AI Enterprise in production for 90 days using your existing infrastructure.

Request a 90-Day
License

Starter Kits

Start building information retrieval pipelines and generative AI applications for multimodal data ingestion, embedding, reranking, retrieval-augmented generation, and agentic workflows by accessing NVIDIA Blueprints, tutorials, notebooks, blogs, forums, reference code, comprehensive documentation, and more.

AI Assistant for Customer Service

Build enhanced AI assistants that are more personalized and secure by leveraging RAG, NeMo Retriever, NIM, and the latest AI agent-building methodologies.

AI Agent for Enterprise Research

Develop AI agents that continuously process and synthesize multimodal enterprise data, reason, plan, and refine to generate comprehensive reports.

Digital Human for Customer Service

Bring applications to life with an AI-powered digital avatar that can transform customer service experiences.

Visual Agent for Video Search and Summarization

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A.

NVIDIA NeMo Retriever Learning Library

More Resources

Explore the Community

Get Training and Certification

Accelerate Your Startup

Ethical AI

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.

Get Started With NeMo Retriever Today.

Try Now