How Retrieval-Augmented Generation Works
RAG enhances large language models (LLMs) by retrieving the most relevant and current information from external knowledge sources. Before a user can retrieve responses from a RAG pipeline, data must be ingested into the knowledge base.
Data Extraction: Multimodal, structured, and unstructured data is extracted from various formats and converted to text so it can be filtered, chunked, and fed into the retrieval pipeline.
Data Retrieval: Extracted data is passed to an embedding model to create knowledge embeddings that go into a vector database. When a user submits a query, the system embeds the query, retrieves relevant data from the vector database, reranks the results, and sends them to the LLM to return the most accurate and context-aware responses.
Explore RAG Technology
NVIDIA NeMo Retriever
NVIDIA NeMo™ Retriever is a collection of generative AI microservices for extraction, embedding, and reranking that enable developers to build RAG pipelines that generate business insights in real time with high accuracy and maximum data privacy.
NVIDIA NeMo Agent Toolkit
NVIDIA NeMo Agent toolkit is an open-source library for framework-agnostic profiling, evaluation, and optimization of AI agent systems. By exposing hidden bottlenecks and costs, it helps enterprises scale agentic systems efficiently while maintaining reliability.
NVIDIA cuVS
NVIDIA cuVS is an open-source library for GPU-accelerated vector search and data clustering. It enables higher throughput, lower latency, and faster index build times, and improves the efficiency of semantic search within pipelines and applications such as information retrieval or RAG.
NVIDIA NeMo Curator
NVIDIA NeMo Curator is a framework that provides prebuilt accelerated pipelines to process multimodal data at scale, necessary to improve the performance of RAG systems.
NVIDIA NeMo Customizer
NVIDIA NeMo Customizer is a high-performance, scalable microservice that simplifies fine-tuning and alignment of generative AI models, including embedding models for domain-specific use cases, making it easier to adopt generative AI across industries.
NVIDIA NeMo Evaluator
NVIDIA NeMo Evaluator is an SDK and microservice for evaluating generative AI models, RAG pipelines, and agents with 100+ benchmarks and custom metrics across any environment.
NVIDIA NIM
NVIDIA NIM™ is a set of easy-to-use microservices designed to accelerate the deployment of generative AI models across any cloud or data center.
NVIDIA Nemotron
NVIDIA Nemotron™ is a family of the most open and efficient multimodal models, with open datasets and recipes for building agentic AI.
Explore NVIDIA AI Blueprints That Use RAG
NVIDIA Blueprints are reference workflows for AI use cases built with NVIDIA Nemotron foundation models, and NIM and NeMo microservices. With these blueprints, developers can build production-ready agentic AI applications that empower employees with real-time insights, connecting them to AI query engines to enable transformational efficiency and productivity gains.
Enterprise RAG
Connect secure, scalable, reliable AI applications to your company’s internal enterprise data using industry-leading NeMo Retriever embedding and reranking models for information retrieval at scale.
Streaming Data to RAG
Unlock dynamic, context-aware insights from streaming sources like radio signals and other sensor data.
AI Agent for Video Search and Summarization
Ingest massive volumes of live or archived videos, and extract insights for summarization and interactive Q&A.
Biomedical AI-Q Research Agent
Improve the efficiency and accuracy of various clinical development processes, including R&D, literature review, protocol generation, clinical trial screening, and pharmacovigilance.
AI Assistants for Customer Service
Develop secure, context-aware virtual assistants that meet the unique needs of your business, and enhance customer service operations.
