Skip to content

Build a RAG from scratch

This tutorial will walk you through the process of building a RAG (Retrieval Augmented Generation) system from scratch without using libraries.

Using High Level Libraries can build a demo faster, but they hide how things work inside. This guide will show you how RAG works step-by-step. This can really help you to understand the fundamental concepts of RAG pipeline.

Note

I highly recommend you to read How does RAG work? first before implementing this tutorial.

In this tutorial, we'll not use vector database or chunk the data. This is to focus on the core concepts of RAG.

Install the required packages

First, let's install the required packages:

%pip install transformers openai torch torchvision --q

Set up the OpenAI API key

Next, we'll set up our OpenAI API key, we will be using OpenAI model for this tutorial. You can use any other LLM that you want. If you havent created an OpenAI account yet, you can do it here.

import os

os.environ["OPENAI_API_KEY"] = "sk-proj-..."

Define Data Source

Now, let's define our corpus of documents, we pre-chunked the data as we want to focus on the core concepts of RAG instead. Chunking will be covered in another tutorial with more details.

corpus_of_documents = [
    "At a YC event, Brian Chesky gave a memorable talk where he challenged conventional wisdom about running large companies. As Airbnb grew, he received advice to 'hire good people and give them room' which proved disastrous, leading him to develop his own management approach inspired by Steve Jobs.",
    "Many successful founders at the event reported similar experiences - following traditional management advice had damaged their companies instead of helping them grow. This raised questions about why everyone was giving founders the wrong advice.",
    "The answer emerged: founders were being told how to run companies like professional managers, not founders. There are two distinct modes of running a company: founder mode and manager mode. Most assumed scaling meant switching to manager mode.",
    "Founder mode remains largely undocumented - there are no specific books about it, and business schools don't acknowledge its existence. What we know comes from individual founders' experiments and experiences.",
    "The traditional manager approach treats parts of the org chart as black boxes, avoiding 'micromanagement' by delegating completely to direct reports. In practice, this often means hiring professional fakers who can damage the company.",
    "Founders report feeling gaslit both by advisors pushing manager mode and by employees when implementing it. This is a rare case where founders should trust their instincts despite widespread disagreement.",
    "Founder mode breaks the principle that CEOs should only engage via direct reports. Skip-level meetings become normal rather than unusual, opening up many new organizational possibilities.",
    "Steve Jobs' example of running annual retreats for Apple's 100 most important people (not necessarily highest-ranking) demonstrates an unconventional approach that could make big companies feel like startups.",
    "While founders can't run a 2000-person company exactly like a 20-person startup, the extent and nature of delegation in founder mode will vary by company and situation, making it more complex than manager mode.",
    "Early evidence suggests founder mode works better than manager mode, based on examples of founders who've found their way toward it, even when their methods were considered eccentric.",
    "The premise that founders must run their companies as managers has been accepted even in Silicon Valley. The dismay of founders who tried this approach and their success in finding alternatives suggests another way exists.",
    "Business education and literature focus almost exclusively on manager mode, leaving a gap in understanding how founders can effectively run larger companies while maintaining their unique advantages.",
    "Brian Chesky's success at Airbnb, demonstrated by their exceptional free cash flow margin, suggests that founder mode can produce superior results when properly implemented.",
    "The insight about different modes of company operation came from observing patterns in founder experiences, particularly their consistent struggles with conventional management wisdom.",
    "Founder mode may involve more direct engagement across organizational levels, breaking traditional management hierarchies while maintaining necessary delegation structures.",
    "The skills and approaches that work for professional managers may be fundamentally different from what works for founders, suggesting the need for distinct operational frameworks.",
    "The success of founders who rejected traditional management advice indicates that founder mode, while less understood, might be more effective for scaling companies.",
    "The lack of understanding about founder mode represents both a challenge and an opportunity - founders have achieved success despite following suboptimal advice.",
    "The story suggests that many successful founders may have independently discovered aspects of founder mode while being viewed as unconventional or difficult.",
    "The potential for improved company performance once founder mode is better understood and documented could lead to significant changes in how fast-growing companies are managed."
]

Define Embedding Function

Next, we'll set up our embedding function. The embedder will turn our text into a embeddings by using BERT model that is pulled from Hugging Face. BERT is an open source model from Google.

from transformers import AutoTokenizer, AutoModel
import torch

model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

def embedder(chunk):
    """
    embed corpus of documents
    """
    tokens = tokenizer(chunk, return_tensors="pt", padding=True, truncation=True)

    with torch.no_grad():
        model_output = model(**tokens)

    embeddings = model_output.last_hidden_state[:, 0, :]
    embed = embeddings[0].numpy()
    return embed

Embed the Data Source

Now, let's embed our corpus, this code maps each chunks that we defined earlier into embeddings. Each of the original text corpus is mapped to their corresponding embeddings.

embedded_data_source = []

for chunk in corpus_of_documents:
    #embed the chunk
    embedding = embedder(chunk)

    #map the embedding to the chunk
    embedded_data_source.append((embedding, chunk))

Define Search Function

Now, let's define our search function. This function will find the top k chunks that are most similar to the query. The function will embed the query, then compare to the embeddings of the data source. By using cosine similarity, it will find the most similar chunks compare to the query.

import numpy as np

def search(query, data_source, k=5):
    """
    Search function to find top k similar chunks to the query
    """
    # Compute embedding for the query
    query_embedding = embedder(query)

    # Normalize the query embedding
    query_norm = np.linalg.norm(query_embedding)

    # Compute cosine similarities between query embedding and data source embeddings
    similarities = []
    for embedding, chunk in data_source:
        # Normalize the data source embedding
        embedding_norm = np.linalg.norm(embedding)

        # Compute cosine similarity
        similarity = np.dot(query_embedding, embedding) / (query_norm * embedding_norm)
        similarities.append((similarity, chunk))

    # Sort the similarities in descending order
    similarities.sort(reverse=True, key=lambda x: x[0])

    # Get the top k chunks
    top_k = similarities[:k]

    return top_k

Test the Search Function

Now, let's test our search function. We'll use the query "explain what is founder mode" and find the top 5 chunks that are most similar to the query.

user_query = "explain what is founder mode"
top_k_chunks = search(user_query, embedded_data_source, k=5)

for similarity, chunk in top_k_chunks:
    print(f"Similarity: {similarity:.4f}, Chunk: {chunk}")

Extract the Retrieved Chunks

Now, let's extract the retrieved chunks. After we found the most similar chunks, we'll extract the text from the chunks. We only need the text after we found the most similar chunks.

retrieved_chunks = []

#extract the chunks's text only from the top_k_chunks
for similarity, chunk in top_k_chunks:
    retrieved_chunks.append(chunk)

print(retrieved_chunks)

Define Base Prompt

Now, let's define our base prompt. This prompt will be used to generate the response for our LLM.

base_prompt = """You are an AI assistant for RAG. Your task is to understand the user question, and provide an answer using the provided contexts.

Your answers are correct, high-quality, and written by an domain expert. If the provided context does not contain the answer, simply state, "The provided context does not have the answer."

User question: {user_query}

Contexts:
{chunks_information}
"""

Format the Prompt

Now, let's format the prompt. We'll replace the {user_query} with the user query, and replace the {chunks_information} with the retrieved chunks that we extracted earlier.

#formatting the prompt
prompt = base_prompt.format(user_query=user_query, chunks_information="\n".join([chunk for chunk in retrieved_chunks]))

print(prompt)

Generate Response

Finally, let's use OpenAI's API to generate a response.

from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o-mini",
    temperature=0,
    messages=[
        {"role": "system", "content": prompt},
    ],
)

print(response.choices[0].message.content)

This completes the RAG (Retrieval-Augmented Generation) process from scratch!

Note

This is just a demonstration of RAG to help you understand how RAG works. In the next chapter, we'll cover deeper into each components of RAG. Teaching you how to build a production-ready RAG system.

What's Next?

Now that you understand the basic building blocks of RAG, you're ready to dive deeper! The next chapters will be covering RAG Components, helping you build even better systems.