Retrieval Augmented Generation (RAG) is an architectural approach that combines an information retrieval component with a large language model (LLM) to improve the quality and relevance of the LLM's outputs.
The user provides an input query or prompt.
An information retrieval system searches a knowledge base (e.g. Wikipedia, internal documents) and retrieves relevant documents or passages related to the query.
The retrieved context is concatenated with the original query and fed into the LLM as an augmented prompt.
The LLM generates an output response conditioned on both its pre-training data and the retrieved context, allowing it to provide more accurate, up-to-date and relevant information.
The key benefits of RAG include:
Providing up-to-date and accurate responses by grounding the LLM's output on external knowledge sources rather than just its training data.
Reducing hallucinations (fabricated outputs) by constraining responses to factual retrieved context.
Enabling domain-specific and relevant responses tailored to an organization's proprietary data.
Being more efficient than retraining LLMs from scratch when knowledge needs to be updated.
RAG allows LLMs to bypass the need for continuous retraining by retrieving the latest information at runtime, making it well-suited for applications like question-answering that require current and trustworthy knowledge.