Contextual Chunk Headers in RAG
Contextual Chunk Headers (CCH) in Simple RAG
Retrieval-Augmented Generation (RAG) improves the factual accuracy of language models by retrieving relevant external knowledge before generating a response. However, standard chunking often loses important context, making retrieval less effective.
Contextual Chunk Headers (CCH) enhance RAG by prepending high-level context (like document titles or section headers) to each chunk before embedding them. This improves retrieval quality and prevents out-of-context responses.
Steps in this Notebook:
- Data Ingestion: Load and preprocess the text data.
- Chunking with Contextual Headers: Extract section titles and prepend them to chunks.
- Embedding Creation: Convert context-enhanced chunks into numerical representations.
- Semantic Search: Retrieve relevant chunks based on a user query.
- Response Generation: Use a language model to generate a response from retrieved text.
- Evaluation: Assess response accuracy using a scoring system.
Setting Up the Environment
We begin by importing necessary libraries.
import os
import numpy as np
import json
from openai import OpenAI
import fitz
from tqdm import tqdm
Extracting Text and Identifying Section Headers
We extract text from a PDF while also identifying section titles (potential headers for chunks).
def extract_text_from_pdf(pdf_path):
mypdf = fitz.open(pdf_path)
all_text = ""
for page_num in range(mypdf.page_count):
page = mypdf[page_num]
text = page.get_text("text")
all_text += text
return all_text
Setting Up the OpenAI API Client
We initialize the OpenAI client to generate embeddings and responses.
client = OpenAI(
base_url="https://api.studio.nebius.com/v1/",
api_key=os.getenv("OPENAI_API_KEY")
)
Chunking Text with Contextual Headers
To improve retrieval, we generate descriptive headers for each chunk using an LLM model.
def generate_chunk_header(chunk, model="meta-llama/Llama-3.2-3B-Instruct"):
system_prompt = "Generate a concise and informative title for the given text."
response = client.chat.completions.create(
model=model,
temperature=0,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": chunk}
]
)
return response.choices[0].message.content.strip()
def chunk_text_with_headers(text, n, overlap):
chunks = []
for i in range(0, len(text), n - overlap):
chunk = text[i:i + n]
header = generate_chunk_header(chunk)
chunks.append({"header": header, "text": chunk})
return chunks
Extracting and Chunking Text from a PDF File
Now, we load the PDF, extract text, and split it into chunks.
pdf_path = "data/AI_Information.pdf"
extracted_text = extract_text_from_pdf(pdf_path)
text_chunks = chunk_text_with_headers(extracted_text, 1000, 200)
print("Sample Chunk:")
print("Header:", text_chunks[0]['header'])
print("Content:", text_chunks[0]['text'])
Creating Embeddings for Headers and Text
We create embeddings for both headers and text to improve retrieval accuracy.
def create_embeddings(text, model="BAAI/bge-en-icl"):
response = client.embeddings.create(
model=model,
input=text
)
return response.data[0].embedding
embeddings = []
for chunk in tqdm(text_chunks, desc="Generating embeddings"):
text_embedding = create_embeddings(chunk["text"])
header_embedding = create_embeddings(chunk["header"])
embeddings.append({"header": chunk["header"], "text": chunk["text"], "embedding": text_embedding, "header_embedding": header_embedding})
Performing Semantic Search
We implement cosine similarity to find the most relevant text chunks for a user query.
def cosine_similarity(vec1, vec2):
return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
def semantic_search(query, chunks, k=5):
query_embedding = create_embeddings(query)
similarities = []
for chunk in chunks:
sim_text = cosine_similarity(np.array(query_embedding), np.array(chunk["embedding"]))
sim_header = cosine_similarity(np.array(query_embedding), np.array(chunk["header_embedding"]))
avg_similarity = (sim_text + sim_header) / 2
similarities.append((chunk, avg_similarity))
similarities.sort(key=lambda x: x[1], reverse=True)
return [x[0] for x in similarities[:k]]
Running a Query on Extracted Chunks
with open('data/val.json') as f:
data = json.load(f)
query = data[0]['question']
top_chunks = semantic_search(query, embeddings, k=2)
print("Query:", query)
for i, chunk in enumerate(top_chunks):
print(f"Header {i+1}: {chunk['header']}")
print(f"Content:\n{chunk['text']}\n")
Generating a Response Based on Retrieved Chunks
system_prompt = "You are an AI assistant that strictly answers based on the given context. If the answer cannot be derived directly from the provided context, respond with: 'I do not have enough information to answer that.'"
def generate_response(system_prompt, user_message, model="meta-llama/Llama-3.2-3B-Instruct"):
response = client.chat.completions.create(
model=model,
temperature=0,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message}
]
)
return response
user_prompt = "\n".join([f"Header: {chunk['header']}\nContent:\n{chunk['text']}" for chunk in top_chunks])
user_prompt = f"{user_prompt}\nQuestion: {query}"
ai_response = generate_response(system_prompt, user_prompt)
Evaluating the AI Response
We compare the AI response with the expected answer and assign a score.
evaluate_system_prompt = """You are an intelligent evaluation system.
Assess the AI assistant's response based on the provided context.
- Assign a score of 1 if the response is very close to the true answer.
- Assign a score of 0.5 if the response is partially correct.
- Assign a score of 0 if the response is incorrect.
Return only the score (0, 0.5, or 1)."""
true_answer = data[0]['ideal_answer']
evaluation_prompt = f"""
User Query: {query}
AI Response: {ai_response}
True Answer: {true_answer}
{evaluate_system_prompt}
"""
evaluation_response = generate_response(evaluate_system_prompt, evaluation_prompt)
print("Evaluation Score:", evaluation_response.choices[0].message.content)