How Retrieval-Augmented Generation Works
RAG enhances large language models (LLMs) by retrieving the most relevant and current information from external knowledge sources. Before a user can retrieve responses from a RAG pipeline, data must be ingested into the knowledge base.
Data Ingestion: Multimodal, structured, and unstructured data is extracted from various formats and converted to text so it can be filtered, chunked, and fed into the retrieval pipeline.
Data Retrieval: Extracted data is passed to an embedding model to create knowledge embeddings that go into a vector database. When a user submits a query, the system embeds the query, retrieves relevant data from the vector database, reranks the results, and sends them to the LLM to return the most accurate and context-aware responses.
Explore RAG Tools and Technologies
NVIDIA NeMo Retriever
NVIDIA NeMo™ Retriever is a collection of generative AI microservices for ingestion, extraction, embedding, and reranking that enable developers to build pipelines that generate business insights in real time with high accuracy and maximum data privacy.
NVIDIA NeMo Evaluator
NVIDIA NeMo Evaluator is an enterprise-grade microservice that provides industry-standard benchmarking of AI models, synthetic data generation, and end-to-end RAG pipelines. By assessing embedding, retrieval, and generation models, developers can ensure that each component of their RAG application functions optimally.
NVIDIA cuVS
NVIDIA cuVS is an open-source library for GPU-accelerated vector search and data clustering. It enables higher throughput, lower latency, and faster index build times, and improves the efficiency of semantic search within pipelines and applications such as information retrieval or RAG.
Llama 3.1 NIM Microservices
Llama 3.1 NIM microservices leverage customizable LLMs to improve the helpfulness of generated responses, refine retrieval results over multiple sources and languages, understand regional nuances, and more.
Mistral AI NIM Microservices
Mistral AI NIM microservices provide LLMs for SOTA reasoning, knowledge, and code capabilities, delivering superior accuracy for agentic applications, multilingual tasks, GPU-accelerated generation of text embeddings, and more.
Retrieval NIM Microservices
Retrieval NIM microservices offer embedding and reranking models to connect chat-based LLMs to proprietary enterprise data and identify the right chunks of data from your diverse business data to improve the accuracy of responses.
Explore NVIDIA AI Blueprints That Use RAG
NVIDIA AI Blueprints are reference workflows for generative AI use cases built with NVIDIA NIM™ microservices. With these blueprints, developers can build production-ready AI applications that empower employees with real-time insights, connecting them to AI query engines to enable transformational efficiency and productivity gains.
Multimodal PDF Data Extraction for Enterprise RAG
Ingest and extract highly accurate insights contained in text, graphs, charts, and tables within massive volumes of enterprise data.
AI Virtual Assistants for Customer Service
Develop secure, context-aware virtual assistants that meet the unique needs of your business and enhance customer service operations.
AI Chatbots Using RAG
Build fully functional RAG-based AI chatbots that can accurately answer questions about your enterprise data and generate valuable insights.