NVIDIA NeMo™ Retriever is a collection of generative AI microservices that provide world-class information retrieval with high accuracy and maximum data privacy to generate context-aware responses and insights in real time from large corpuses of data. Built with NVIDIA NIM™ , developers can flexibly leverage fine-tuned NeMo Retriever microservices in combination with community or custom models—to build AI query engines with scalable document ingestion and advanced retrieval-augmented generation (RAG), connecting applications to varied types of data wherever it resides. As part of NVIDIA AI Enterprise , NeMo Retriever is designed for production-ready information retrieval pipelines and provides the security, stability, and support required by enterprises running their business on AI.

How NeMo Retriever Works

NeMo Retriever provides developers with the components needed to build data ingestion and information retrieval pipelines. The ingestion pipeline extracts structured and unstructured data, including text, charts, documents, and tables, converting it to text and filtering it to avoid duplicate chunks.



The extracted data is fed into a retrieval pipeline. A NeMo Retriever converts the chunks into embeddings and stores them in a vector database, accelerated by NVIDIA cuVS, for enhanced performance and speed of indexing and search.



When a user submits a query, relevant information is retrieved from the vast repository of ingested knowledge. NeMo Retriever embedding NIM embeds the user query, which is used to retrieve the most relevant chunks using vector similarity search from the vector database.



A NeMo Retriever reranking NIM evaluates and reranks the results to ensure the most accurate and useful chunks are used to augment the prompt. With the most pertinent information at hand, the LLM NIM generates a response that’s informed, accurate, and contextually relevant. Various LLM NIM microservices can be used from the NVIDIA API catalog to enable additional capabilities such as synthetic data generation, and using NVIDIA NeMo developers can fine-tune them to better align with specific use-case requirements.



NVIDIA NeMo Retriever collection of NIM microservices are used to build optimized ingestion and retrieval pipelines for highly accurate information retrieval at scale.

