Contextual Chunk Headers (CCH) in Simple RAG

Retrieval-Augmented Generation (RAG) improves the factual accuracy of language models by retrieving relevant external knowledge before generating a response. However, standard chunking often loses important context, making retrieval less effective.

Contextual Chunk Headers (CCH) enhance RAG by prepending high-level context (like document titles or section headers) to each chunk before embedding them. This improves retrieval quality and prevents out-of-context responses.

Steps in this Notebook:

Data Ingestion: Load and preprocess the text data.
Chunking with Contextual Headers: Extract section titles and prepend them to chunks.
Embedding Creation: Convert context-enhanced chunks into numerical representations.
Semantic Search: Retrieve relevant chunks based on a user query.
Response Generation: Use a language model to generate a response from retrieved text.
Evaluation: Assess response accuracy using a scoring system.

Setting Up the Environment

We begin by importing necessary libraries.

import os
import numpy as np
import json
from openai import OpenAI
import fitz
from tqdm import tqdm

Extracting Text and Identifying Section Headers

We extract text from a PDF while also identifying section titles (potential headers for chunks).

def extract_text_from_pdf(pdf_path):
    mypdf = fitz.open(pdf_path)
    all_text = ""
    for page_num in range(mypdf.page_count):
        page = mypdf[page_num]
        text = page.get_text("text")
        all_text += text
    return all_text

Setting Up the OpenAI API Client

We initialize the OpenAI client to generate embeddings and responses.

client = OpenAI(
    base_url="https://api.studio.nebius.com/v1/",
    api_key=os.getenv("OPENAI_API_KEY")
)

Chunking Text with Contextual Headers

To improve retrieval, we generate descriptive headers for each chunk using an LLM model.

def generate_chunk_header(chunk, model="meta-llama/Llama-3.2-3B-Instruct"):
    system_prompt = "Generate a concise and informative title for the given text."
    response = client.chat.completions.create(
        model=model,
        temperature=0,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": chunk}
        ]
    )
    return response.choices[0].message.content.strip()

def chunk_text_with_headers(text, n, overlap):
    chunks = []
    for i in range(0, len(text), n - overlap):
        chunk = text[i:i + n]
        header = generate_chunk_header(chunk)
        chunks.append({"header": header, "text": chunk})
    return chunks

Extracting and Chunking Text from a PDF File

Now, we load the PDF, extract text, and split it into chunks.

pdf_path = "data/AI_Information.pdf"
extracted_text = extract_text_from_pdf(pdf_path)
text_chunks = chunk_text_with_headers(extracted_text, 1000, 200)

print("Sample Chunk:")
print("Header:", text_chunks[0]['header'])
print("Content:", text_chunks[0]['text'])

Creating Embeddings for Headers and Text

We create embeddings for both headers and text to improve retrieval accuracy.

def create_embeddings(text, model="BAAI/bge-en-icl"):
    response = client.embeddings.create(
        model=model,
        input=text
    )
    return response.data[0].embedding

embeddings = []
for chunk in tqdm(text_chunks, desc="Generating embeddings"):
    text_embedding = create_embeddings(chunk["text"])
    header_embedding = create_embeddings(chunk["header"])
    embeddings.append({"header": chunk["header"], "text": chunk["text"], "embedding": text_embedding, "header_embedding": header_embedding})

Performing Semantic Search

We implement cosine similarity to find the most relevant text chunks for a user query.

def cosine_similarity(vec1, vec2):
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

def semantic_search(query, chunks, k=5):
    query_embedding = create_embeddings(query)
    similarities = []
    for chunk in chunks:
        sim_text = cosine_similarity(np.array(query_embedding), np.array(chunk["embedding"]))
        sim_header = cosine_similarity(np.array(query_embedding), np.array(chunk["header_embedding"]))
        avg_similarity = (sim_text + sim_header) / 2
        similarities.append((chunk, avg_similarity))
    
    similarities.sort(key=lambda x: x[1], reverse=True)
    return [x[0] for x in similarities[:k]]

Running a Query on Extracted Chunks

with open('data/val.json') as f:
    data = json.load(f)

query = data[0]['question']
top_chunks = semantic_search(query, embeddings, k=2)

print("Query:", query)
for i, chunk in enumerate(top_chunks):
    print(f"Header {i+1}: {chunk['header']}")
    print(f"Content:\n{chunk['text']}\n")

Generating a Response Based on Retrieved Chunks

system_prompt = "You are an AI assistant that strictly answers based on the given context. If the answer cannot be derived directly from the provided context, respond with: 'I do not have enough information to answer that.'"

def generate_response(system_prompt, user_message, model="meta-llama/Llama-3.2-3B-Instruct"):
    response = client.chat.completions.create(
        model=model,
        temperature=0,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ]
    )
    return response

user_prompt = "\n".join([f"Header: {chunk['header']}\nContent:\n{chunk['text']}" for chunk in top_chunks])
user_prompt = f"{user_prompt}\nQuestion: {query}"

ai_response = generate_response(system_prompt, user_prompt)

Evaluating the AI Response

We compare the AI response with the expected answer and assign a score.

evaluate_system_prompt = """You are an intelligent evaluation system. 
Assess the AI assistant's response based on the provided context. 
- Assign a score of 1 if the response is very close to the true answer. 
- Assign a score of 0.5 if the response is partially correct. 
- Assign a score of 0 if the response is incorrect.
Return only the score (0, 0.5, or 1)."""

true_answer = data[0]['ideal_answer']

evaluation_prompt = f"""
User Query: {query}
AI Response: {ai_response}
True Answer: {true_answer}
{evaluate_system_prompt}
"""

evaluation_response = generate_response(evaluate_system_prompt, evaluation_prompt)

print("Evaluation Score:", evaluation_response.choices[0].message.content)

Contextual Chunk Headers in RAG