How does RAG (Retrieval Augmented Generation) work?¶

Info

Want to know what is RAG (Retrieval Augmented Generation)? Read this article

Retrieval Augmented Generation (RAG) is a way to give knowledge to LLMs. RAG works by retrieving information and then passing it to the LLM. But how exactly?

RAG Pipeline¶

Here's how RAG works in detail:

Your data will be preprocessed (usually chunked into smaller chunks). Then, your data will be converted into an embedding and stored in a vector database.

When a user asks a query, the system will convert the query into an embedding. This embedding will be used to search for most similiar embedding from the vector database.

The most relevant chunks will be retrieved from the vector database then passed to LLM.

LLM will then generate a response based on the retrieved chunks.

Whole RAG Pipeline¶

Here's the whole process in one picture!

Your documents are split into pieces and turned into embeddings
When someone asks a question, it gets turned into an embedding too
The system finds the most matching pieces from your documents
The AI uses these pieces to give you a helpful answer

Implement your first RAG System¶

Now that you understand how RAG works, let's implement your first RAG system! Read the next tutorial to learn how to build a RAG system from scratch.