Agentic RAG
Beyond simple retrieve-then-generate: agents that intelligently decide when, what, and how to retrieve, then critique and correct their own retrieval.
The RAG Evolution
BASIC RAG AGENTIC RAG SELF-RAG CORRECTIVE RAG
────────────── ────────────── ────────────── ──────────────
Query Query Query Query
│ │ │ │
▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ ALWAYS │ │ DECIDE │ │ DECIDE │ │ ALWAYS │
│RETRIEVE │ │IF NEEDED│ │IF NEEDED│ │RETRIEVE │
└────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Vector │ │ Multiple│ │ Retrieve│ │ GRADE │
│ Search │ │ Tools │ │ + Grade │ │ EACH │
└────┬────┘ └────┬────┘ │ Relevance │ DOCUMENT│
│ │ └────┬────┘ └────┬────┘
│ │ │ │
│ │ ▼ ┌────┴────┐
│ │ ┌─────────┐ │ CORRECT │
│ │ │ Generate│ │ AMBIG. │
│ │ │+ Self- │ │ INCORR. │
│ │ │ Critique│ └────┬────┘
▼ ▼ └────┬────┘ │
┌─────────┐ ┌─────────┐ │ ▼
│GENERATE │ │GENERATE │ ▼ ┌─────────┐
└─────────┘ └─────────┘ ┌─────────┐ │GENERATE │
│ Revise │ └─────────┘
│if Needed│
└─────────┘
GRAPH RAG
──────────────
Query
│
├──────────────┐
▼ ▼
┌─────────┐ ┌─────────┐
│ Extract │ │ Vector │
│Entities │ │ Search │
└────┬────┘ └────┬────┘
│ │
▼ │
┌─────────┐ │
│ Graph │ │
│Traversal│ │
└────┬────┘ │
│ │
└─────┬──────┘
▼
┌─────────┐
│ COMBINE │
│ Context │
└────┬────┘
▼
┌─────────┐
│GENERATE │
└─────────┘ | Approach | When to Retrieve | Quality Control | Best For |
|---|---|---|---|
| Basic RAG | Always | None | Simple Q&A |
| Agentic RAG | Agent decides | Tool selection | Varied queries |
| Self-RAG | Agent decides | Self-critique | Accuracy critical |
| Corrective RAG | Always | Grade + correct | Noisy retrieval |
| Graph RAG | Always (dual) | Structured + semantic | Entity-rich domains |
RAG approach comparison
1. Basic RAG (Baseline)
The simplest RAG architecture: always retrieve, then generate. No intelligence about whether retrieval is needed or if retrieved documents are relevant.
# Basic RAG: Always retrieve, then generate
function basicRAG(query, vectorStore, llm):
# Step 1: Embed the query
queryEmbedding = embedModel.encode(query)
# Step 2: Retrieve relevant documents
documents = vectorStore.search(
embedding: queryEmbedding,
topK: 5
)
# Step 3: Build context from retrieved docs
context = formatDocuments(documents)
# Step 4: Generate response with context
prompt = """
Use the following context to answer the question.
If the context doesn't contain the answer, say "I don't know."
Context:
{context}
Question: {query}
"""
response = llm.generate(prompt)
return response
# Limitations:
# - Always retrieves, even for simple questions
# - No quality check on retrieved documents
# - Single retrieval pass (may miss info)
# - No reasoning about what to retrieve from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
def create_basic_rag(documents: list[str], collection_name: str = "docs"):
"""Create a basic RAG pipeline."""
# Initialize embeddings
embeddings = OpenAIEmbeddings()
# Create vector store
vectorstore = Chroma.from_texts(
texts=documents,
embedding=embeddings,
collection_name=collection_name
)
# Create retriever
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 5}
)
# Create QA chain
llm = ChatOpenAI(model="gpt-4", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff", # Simple: stuff all docs into context
retriever=retriever,
return_source_documents=True
)
return qa_chain
# Usage
qa = create_basic_rag(my_documents)
result = qa.invoke({"query": "What is the refund policy?"})
print(result["result"])
print(f"Sources: {[doc.metadata for doc in result['source_documents']]}")] using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.VectorData;
using OpenAI;
public class BasicRagPipeline
{
private readonly IVectorStore _vectorStore;
private readonly IChatClient _chatClient;
private const string CollectionName = "documents";
public BasicRagPipeline(string apiKey)
{
var openAI = new OpenAIClient(apiKey);
// Initialize chat client
_chatClient = openAI
.GetChatClient("gpt-4o")
.AsIChatClient();
// Initialize vector store with embeddings
var embeddingClient = openAI
.GetEmbeddingClient("text-embedding-3-small")
.AsIEmbeddingGenerator<string, Embedding<float>>();
_vectorStore = new InMemoryVectorStore();
}
public async Task IndexDocumentsAsync(
IEnumerable<(string Id, string Text)> documents,
CancellationToken ct = default)
{
var collection = _vectorStore.GetCollection<string, DocumentRecord>(CollectionName);
await collection.CreateCollectionIfNotExistsAsync(ct);
foreach (var (id, text) in documents)
{
var record = new DocumentRecord { Id = id, Text = text };
await collection.UpsertAsync(record, ct);
}
}
public async Task<string> QueryAsync(
string query,
int topK = 5,
CancellationToken ct = default)
{
// Step 1: Retrieve relevant documents
var collection = _vectorStore.GetCollection<string, DocumentRecord>(CollectionName);
var results = await collection.VectorizedSearchAsync(query, topK, ct);
// Step 2: Build context
var context = string.Join("\n\n", results.Select(r => r.Record.Text));
// Step 3: Generate response
var prompt = $@"Use the following context to answer the question.
If the context doesn't contain the answer, say ""I don't know.""
Context:
{context}
Question: {query}";
var response = await _chatClient.GetResponseAsync(prompt, ct);
return response.Text;
}
}
public class DocumentRecord
{
[VectorStoreRecordKey]
public string Id { get; set; } = "";
[VectorStoreRecordData]
public string Text { get; set; } = "";
[VectorStoreRecordVector(1536)]
public ReadOnlyMemory<float> Embedding { get; set; }
} Limitations
2. Agentic RAG
An agent with retrieval tools that decides when retrieval is needed, which tool to use, and what query to formulate:
# Agentic RAG: Agent decides when and what to retrieve
class AgenticRAG:
tools: [
searchDocuments(query, filters), # Vector search
lookupEntity(entityName), # Knowledge base lookup
webSearch(query), # External search
noRetrieval() # Answer from knowledge
]
function answer(question):
# Agent reasons about retrieval strategy
while not hasAnswer:
thought = llm.reason(
question: question,
previousSteps: history,
availableTools: tools
)
if thought.needsRetrieval:
# Agent formulates retrieval query (may differ from question)
retrievalQuery = thought.formulatedQuery
results = executeRetrieval(thought.selectedTool, retrievalQuery)
# Agent evaluates results
evaluation = llm.evaluate(
question: question,
retrievedInfo: results
)
if evaluation.sufficient:
hasAnswer = true
elif evaluation.needsMoreInfo:
# Refine and retrieve again
history.append(results)
else:
# Try different retrieval strategy
continue
else:
# Agent can answer without retrieval
hasAnswer = true
return llm.generateAnswer(question, history)
# Key differences from basic RAG:
# 1. Agent DECIDES whether to retrieve
# 2. Agent FORMULATES the retrieval query
# 3. Agent EVALUATES retrieval results
# 4. Agent can do MULTIPLE retrieval rounds
# 5. Agent can use DIFFERENT retrieval tools from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from typing import TypedDict, Annotated
import operator
# Define state
class AgentState(TypedDict):
messages: Annotated[list, operator.add]
retrieved_docs: list
needs_retrieval: bool
# Define tools
@tool
def search_documents(query: str, max_results: int = 5) -> str:
"""Search the document store for relevant information."""
results = vector_store.similarity_search(query, k=max_results)
return "
".join([doc.page_content for doc in results])
@tool
def search_web(query: str) -> str:
"""Search the web for current information."""
# Implementation with web search API
return web_search_api.search(query)
@tool
def lookup_entity(entity_name: str) -> str:
"""Look up specific entity in knowledge base."""
return knowledge_base.get(entity_name, "Entity not found")
# Create the agent
llm = ChatOpenAI(model="gpt-4").bind_tools([
search_documents, search_web, lookup_entity
])
def should_retrieve(state: AgentState) -> str:
"""Decide if we need to retrieve or can answer."""
last_message = state["messages"][-1]
if hasattr(last_message, "tool_calls") and last_message.tool_calls:
return "retrieve"
return "answer"
def call_model(state: AgentState) -> dict:
"""Have the agent reason about what to do."""
messages = state["messages"]
# Add system prompt for RAG behavior
system = """You are a helpful assistant with access to retrieval tools.
IMPORTANT: Before answering, consider:
1. Can you answer this from your knowledge? If yes, just respond.
2. Does this need current/specific information? Use search_web.
3. Does this need document lookup? Use search_documents.
4. Is this about a specific entity? Use lookup_entity.
Be strategic about retrieval - don't retrieve if unnecessary."""
response = llm.invoke([{"role": "system", "content": system}] + messages)
return {"messages": [response]}
def generate_answer(state: AgentState) -> dict:
"""Generate final answer based on retrieved context."""
messages = state["messages"]
docs = state.get("retrieved_docs", [])
if docs:
context = "
".join(docs)
messages = messages + [
{"role": "system", "content": f"Context from retrieval:
{context}"}
]
response = llm.invoke(messages)
return {"messages": [response]}
# Build the graph
workflow = StateGraph(AgentState)
workflow.add_node("agent", call_model)
workflow.add_node("retrieve", ToolNode([search_documents, search_web, lookup_entity]))
workflow.add_node("answer", generate_answer)
workflow.set_entry_point("agent")
workflow.add_conditional_edges("agent", should_retrieve, {
"retrieve": "retrieve",
"answer": "answer"
})
workflow.add_edge("retrieve", "agent") # Loop back after retrieval
workflow.add_edge("answer", END)
app = workflow.compile()
# Usage
result = app.invoke({
"messages": [{"role": "user", "content": "What's our Q3 revenue?"}],
"retrieved_docs": [],
"needs_retrieval": False
}) using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using System.ComponentModel;
using OpenAI;
public class AgenticRagPipeline
{
private readonly AIAgent _agent;
private readonly IVectorStore _vectorStore;
public AgenticRagPipeline(string apiKey)
{
var chatClient = new OpenAIClient(apiKey)
.GetChatClient("gpt-4o")
.AsIChatClient();
// Create agent with retrieval tools
_agent = chatClient.CreateAIAgent(
name: "RAGAgent",
instructions: @"You are a helpful assistant with retrieval tools.
Before answering, consider:
1. Can you answer from knowledge? Just respond.
2. Need current info? Use SearchWeb.
3. Need document lookup? Use SearchDocuments.
4. About specific entity? Use LookupEntity.
Be strategic - don't retrieve unnecessarily.",
tools: [
AIFunctionFactory.Create(SearchDocuments),
AIFunctionFactory.Create(SearchWeb),
AIFunctionFactory.Create(LookupEntity)
]
);
}
[Description("Search documents for relevant information")]
private async Task<string> SearchDocuments(
[Description("Search query")] string query,
[Description("Maximum results")] int maxResults = 5)
{
var collection = _vectorStore.GetCollection<string, DocumentRecord>("documents");
var results = await collection.VectorizedSearchAsync(query, maxResults);
return string.Join("\n\n", results.Select(r => r.Record.Text));
}
[Description("Search the web for current information")]
private async Task<string> SearchWeb(
[Description("Search query")] string query)
{
return await _webSearchService.SearchAsync(query);
}
[Description("Look up a specific entity in the knowledge base")]
private async Task<string> LookupEntity(
[Description("Entity name")] string entityName)
{
return await _knowledgeBase.GetAsync(entityName) ?? "Entity not found";
}
public async Task<string> QueryAsync(
string question,
CancellationToken ct = default)
{
var thread = _agent.GetNewThread();
// Agent will automatically use tools as needed
var response = await _agent.RunAsync(question, thread, ct);
return response;
}
} Key Capabilities
- 1. Retrieval Decision - Agent decides IF retrieval is needed
- 2. Query Formulation - Agent rewrites query for better retrieval
- 3. Tool Selection - Agent chooses the right retrieval tool
- 4. Iterative Retrieval - Agent can retrieve multiple times
3. Self-RAG
Self-RAG (Asai et al., 2023) adds self-reflection: the model critiques its own retrieval decisions and generation quality:
# Self-RAG: Model critiques its own retrieval and generation
class SelfRAG:
function answer(question):
# Step 1: Decide if retrieval is needed
retrievalDecision = llm.generate(
prompt: "Given this question, do I need to retrieve information? [Yes/No]",
question: question
)
if retrievalDecision == "No":
# Generate without retrieval
response = llm.generate(question)
return selfCritique(question, response, [])
# Step 2: Retrieve documents
documents = retrieve(question)
# Step 3: For each document, assess relevance
relevantDocs = []
for doc in documents:
isRelevant = llm.generate(
prompt: "Is this document relevant to the question? [Relevant/Irrelevant]",
question: question,
document: doc
)
if isRelevant == "Relevant":
relevantDocs.append(doc)
# Step 4: Generate response with relevant docs
response = llm.generate(
prompt: question,
context: relevantDocs
)
# Step 5: Self-critique the response
return selfCritique(question, response, relevantDocs)
function selfCritique(question, response, sources):
# Check if response is supported by sources
supportScore = llm.generate(
prompt: "Is this response fully supported by the sources? [Fully/Partially/No]",
response: response,
sources: sources
)
# Check if response is useful
usefulnessScore = llm.generate(
prompt: "How useful is this response? [5/4/3/2/1]",
question: question,
response: response
)
if supportScore == "No" or usefulnessScore < 3:
# Regenerate with feedback
return regenerateWithCritique(question, response, supportScore, usefulnessScore)
return {
response: response,
supported: supportScore,
usefulness: usefulnessScore,
sources: sources
} from dataclasses import dataclass
from enum import Enum
class RetrievalDecision(Enum):
YES = "yes"
NO = "no"
class RelevanceScore(Enum):
RELEVANT = "relevant"
IRRELEVANT = "irrelevant"
class SupportScore(Enum):
FULLY_SUPPORTED = "fully_supported"
PARTIALLY_SUPPORTED = "partially_supported"
NOT_SUPPORTED = "not_supported"
@dataclass
class SelfRAGResponse:
answer: str
sources: list[str]
support_score: SupportScore
usefulness_score: int
retrieval_used: bool
class SelfRAG:
def __init__(self, client, retriever, model: str = "gpt-4"):
self.client = client
self.retriever = retriever
self.model = model
def query(self, question: str) -> SelfRAGResponse:
# Step 1: Decide if retrieval is needed
retrieval_decision = self._decide_retrieval(question)
if retrieval_decision == RetrievalDecision.NO:
answer = self._generate_without_context(question)
return self._self_critique(question, answer, [], False)
# Step 2: Retrieve documents
documents = self.retriever.search(question, k=5)
# Step 3: Filter by relevance
relevant_docs = self._filter_relevant(question, documents)
if not relevant_docs:
# Fall back to generation without context
answer = self._generate_without_context(question)
return self._self_critique(question, answer, [], True)
# Step 4: Generate with relevant context
answer = self._generate_with_context(question, relevant_docs)
# Step 5: Self-critique
return self._self_critique(question, answer, relevant_docs, True)
def _decide_retrieval(self, question: str) -> RetrievalDecision:
response = self.client.chat.completions.create(
model=self.model,
messages=[{
"role": "user",
"content": f"""Given this question, do you need to retrieve external information to answer accurately?
Question: {question}
Consider:
- Is this about specific facts, data, or recent events? -> Retrieve
- Is this about general knowledge or reasoning? -> No retrieval
- Is this about personal opinions or hypotheticals? -> No retrieval
Answer with just: YES or NO"""
}]
)
answer = response.choices[0].message.content.strip().upper()
return RetrievalDecision.YES if "YES" in answer else RetrievalDecision.NO
def _filter_relevant(self, question: str, documents: list[str]) -> list[str]:
relevant = []
for doc in documents:
response = self.client.chat.completions.create(
model=self.model,
messages=[{
"role": "user",
"content": f"""Is this document relevant to answering the question?
Question: {question}
Document: {doc[:500]}...
Answer with just: RELEVANT or IRRELEVANT"""
}]
)
if "RELEVANT" in response.choices[0].message.content.upper():
relevant.append(doc)
return relevant
def _self_critique(
self,
question: str,
answer: str,
sources: list[str],
retrieval_used: bool
) -> SelfRAGResponse:
# Check support
support_score = self._check_support(answer, sources)
# Check usefulness
usefulness_score = self._check_usefulness(question, answer)
# Regenerate if quality is low
if support_score == SupportScore.NOT_SUPPORTED and sources:
answer = self._regenerate_with_feedback(
question, answer, sources, "not supported by sources"
)
support_score = self._check_support(answer, sources)
if usefulness_score < 3:
answer = self._regenerate_with_feedback(
question, answer, sources, "not useful enough"
)
usefulness_score = self._check_usefulness(question, answer)
return SelfRAGResponse(
answer=answer,
sources=sources,
support_score=support_score,
usefulness_score=usefulness_score,
retrieval_used=retrieval_used
)
def _check_support(self, answer: str, sources: list[str]) -> SupportScore:
if not sources:
return SupportScore.FULLY_SUPPORTED # No sources to contradict
sources_text = "
---
".join(sources)
response = self.client.chat.completions.create(
model=self.model,
messages=[{
"role": "user",
"content": f"""Is this answer supported by the source documents?
Answer: {answer}
Sources:
{sources_text}
Respond with:
- FULLY_SUPPORTED: All claims in the answer are backed by sources
- PARTIALLY_SUPPORTED: Some claims are backed, others are not
- NOT_SUPPORTED: The answer contradicts or goes beyond the sources"""
}]
)
content = response.choices[0].message.content.upper()
if "FULLY" in content:
return SupportScore.FULLY_SUPPORTED
elif "PARTIALLY" in content:
return SupportScore.PARTIALLY_SUPPORTED
return SupportScore.NOT_SUPPORTED
def _check_usefulness(self, question: str, answer: str) -> int:
response = self.client.chat.completions.create(
model=self.model,
messages=[{
"role": "user",
"content": f"""Rate how useful this answer is for the question (1-5):
Question: {question}
Answer: {answer}
5 = Perfectly answers the question
4 = Good answer with minor gaps
3 = Adequate but could be better
2 = Partially helpful
1 = Not helpful
Respond with just the number."""
}]
)
try:
return int(response.choices[0].message.content.strip()[0])
except:
return 3 Self-RAG Reflection Tokens
[Retrieve]- Should I retrieve? (Yes/No)[IsRel]- Is this document relevant? (Relevant/Irrelevant)[IsSup]- Is response supported? (Fully/Partially/No)[IsUse]- Is response useful? (5/4/3/2/1)
4. Corrective RAG (CRAG)
CRAG (Yan et al., 2024) focuses on evaluating and correcting retrieval quality before generation:
# Corrective RAG (CRAG): Evaluate and correct retrieval quality
class CorrectiveRAG:
function answer(question):
# Step 1: Initial retrieval
documents = retrieve(question)
# Step 2: Evaluate each document's relevance
evaluations = []
for doc in documents:
score = evaluateRelevance(question, doc)
evaluations.append({ doc: doc, score: score })
# Step 3: Determine action based on evaluation
relevantDocs = filter(evaluations, score == "Correct")
ambiguousDocs = filter(evaluations, score == "Ambiguous")
irrelevantDocs = filter(evaluations, score == "Incorrect")
if allRelevant(evaluations):
# All documents are relevant - use them directly
action = "CORRECT"
context = relevantDocs
elif allIrrelevant(evaluations):
# All documents are irrelevant - use web search
action = "INCORRECT"
webResults = webSearch(question)
context = webResults
else:
# Mixed relevance - combine strategies
action = "AMBIGUOUS"
webResults = webSearch(question)
context = relevantDocs + refineDocuments(ambiguousDocs) + webResults
# Step 4: Generate with corrected context
return generate(question, context)
function evaluateRelevance(question, document):
# Three-way classification
prompt = """
Evaluate if this document is relevant to the question.
Question: {question}
Document: {document}
- CORRECT: Document directly helps answer the question
- INCORRECT: Document is not relevant at all
- AMBIGUOUS: Document is partially relevant or tangential
Respond with: CORRECT, INCORRECT, or AMBIGUOUS
"""
return llm.generate(prompt)
function refineDocuments(documents):
# Extract only the relevant portions
refined = []
for doc in documents:
relevantParts = llm.extract(
prompt: "Extract only the parts relevant to the question",
document: doc
)
refined.append(relevantParts)
return refined from dataclasses import dataclass
from enum import Enum
class RelevanceGrade(Enum):
CORRECT = "correct" # Directly relevant
INCORRECT = "incorrect" # Not relevant
AMBIGUOUS = "ambiguous" # Partially relevant
class RetrievalAction(Enum):
USE_RETRIEVED = "use_retrieved"
USE_WEB = "use_web"
COMBINE = "combine"
@dataclass
class GradedDocument:
content: str
grade: RelevanceGrade
confidence: float
class CorrectiveRAG:
def __init__(self, client, retriever, web_search, model: str = "gpt-4"):
self.client = client
self.retriever = retriever
self.web_search = web_search
self.model = model
def query(self, question: str) -> str:
# Step 1: Initial retrieval
documents = self.retriever.search(question, k=5)
# Step 2: Grade each document
graded_docs = [
self._grade_document(question, doc)
for doc in documents
]
# Step 3: Determine corrective action
action = self._determine_action(graded_docs)
# Step 4: Build context based on action
if action == RetrievalAction.USE_RETRIEVED:
context = self._build_context_from_docs(graded_docs)
elif action == RetrievalAction.USE_WEB:
web_results = self.web_search.search(question)
context = self._format_web_results(web_results)
else: # COMBINE
# Use correct docs + refined ambiguous + web
correct_docs = [d for d in graded_docs if d.grade == RelevanceGrade.CORRECT]
ambiguous_docs = [d for d in graded_docs if d.grade == RelevanceGrade.AMBIGUOUS]
refined = self._refine_ambiguous(question, ambiguous_docs)
web_results = self.web_search.search(question)
context = (
self._build_context_from_docs(correct_docs) +
"
" + refined +
"
" + self._format_web_results(web_results)
)
# Step 5: Generate answer
return self._generate(question, context)
def _grade_document(self, question: str, document: str) -> GradedDocument:
response = self.client.chat.completions.create(
model=self.model,
messages=[{
"role": "user",
"content": f"""Grade this document's relevance to the question.
Question: {question}
Document:
{document[:1000]}
Grades:
- CORRECT: Directly helps answer the question
- INCORRECT: Not relevant to the question
- AMBIGUOUS: Partially relevant or tangential
Respond with JSON: {{"grade": "...", "confidence": 0.0-1.0, "reason": "..."}}"""
}],
response_format={"type": "json_object"}
)
data = json.loads(response.choices[0].message.content)
return GradedDocument(
content=document,
grade=RelevanceGrade(data["grade"].lower()),
confidence=data.get("confidence", 0.5)
)
def _determine_action(self, graded_docs: list[GradedDocument]) -> RetrievalAction:
correct = sum(1 for d in graded_docs if d.grade == RelevanceGrade.CORRECT)
incorrect = sum(1 for d in graded_docs if d.grade == RelevanceGrade.INCORRECT)
ambiguous = sum(1 for d in graded_docs if d.grade == RelevanceGrade.AMBIGUOUS)
total = len(graded_docs)
if correct / total >= 0.6:
return RetrievalAction.USE_RETRIEVED
elif incorrect / total >= 0.8:
return RetrievalAction.USE_WEB
else:
return RetrievalAction.COMBINE
def _refine_ambiguous(
self,
question: str,
ambiguous_docs: list[GradedDocument]
) -> str:
if not ambiguous_docs:
return ""
docs_text = "
---
".join([d.content for d in ambiguous_docs])
response = self.client.chat.completions.create(
model=self.model,
messages=[{
"role": "user",
"content": f"""Extract only the parts of these documents that are relevant to the question.
Question: {question}
Documents:
{docs_text}
Return only the relevant excerpts, removing irrelevant content."""
}]
)
return response.choices[0].message.content
def _generate(self, question: str, context: str) -> str:
response = self.client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": f"Context:
{context}"},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content Retrieved Documents
│
▼
┌───────────────────┐
│ GRADE EACH │
│ DOCUMENT │
│ │
│ Correct? │
│ Incorrect? │
│ Ambiguous? │
└─────────┬─────────┘
│
┌─────┴─────┐
│ │
▼ ▼
All Correct All Incorrect Mixed
│ │ │
│ │ │
▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────────┐
│ USE │ │ WEB │ │ COMBINE │
│ DOCS │ │SEARCH │ │ Correct + │
└───────┘ └───────┘ │ Refined + │
│ Web │
└───────────┘ Key Innovation
5. Graph RAG
Graph RAG combines vector search with knowledge graph traversal for structured + semantic retrieval:
# Graph RAG: Combine knowledge graphs with vector retrieval
class GraphRAG:
vectorStore: VectorDB # For semantic search
knowledgeGraph: Neo4j # For structured relationships
function answer(question):
# Step 1: Extract entities from question
entities = extractEntities(question)
# Step 2: Retrieve from both sources
# Vector retrieval for semantic similarity
vectorResults = vectorStore.search(question, topK = 5)
# Graph traversal for related entities
graphResults = []
for entity in entities:
# Find entity in graph
node = knowledgeGraph.findNode(entity)
if node:
# Get related nodes (neighbors, paths)
related = knowledgeGraph.traverse(
startNode: node,
maxDepth: 2,
relationTypes: ["related_to", "part_of", "caused_by"]
)
graphResults.append(related)
# Step 3: Combine and deduplicate
combinedContext = merge(vectorResults, graphResults)
# Step 4: Build structured context
context = formatContext(
semanticDocs: vectorResults,
entityRelations: graphResults,
entities: entities
)
# Step 5: Generate with structured knowledge
return llm.generate(question, context)
function extractEntities(question):
# Use NER or LLM to extract entities
return llm.generate(
prompt: "Extract named entities (people, places, concepts) from: " + question
)
function formatContext(semanticDocs, entityRelations, entities):
context = "## Relevant Documents
"
for doc in semanticDocs:
context += "- " + doc.summary + "
"
context += "
## Entity Relationships
"
for entity in entities:
relations = entityRelations.get(entity, [])
context += f"### {entity}
"
for rel in relations:
context += f"- {rel.type}: {rel.target}
"
return context from neo4j import GraphDatabase
from dataclasses import dataclass
@dataclass
class EntityRelation:
source: str
relation: str
target: str
properties: dict
class GraphRAG:
def __init__(
self,
client,
vector_store,
neo4j_uri: str,
neo4j_auth: tuple,
model: str = "gpt-4"
):
self.client = client
self.vector_store = vector_store
self.graph = GraphDatabase.driver(neo4j_uri, auth=neo4j_auth)
self.model = model
def query(self, question: str) -> str:
# Step 1: Extract entities
entities = self._extract_entities(question)
# Step 2: Vector retrieval
vector_results = self.vector_store.similarity_search(question, k=5)
# Step 3: Graph retrieval
graph_results = self._graph_retrieval(entities)
# Step 4: Build combined context
context = self._build_context(
question, entities, vector_results, graph_results
)
# Step 5: Generate answer
return self._generate(question, context)
def _extract_entities(self, question: str) -> list[str]:
response = self.client.chat.completions.create(
model=self.model,
messages=[{
"role": "user",
"content": f"""Extract named entities from this question.
Include: people, organizations, products, concepts, locations.
Question: {question}
Return as JSON: {{"entities": ["entity1", "entity2", ...]}}"""
}],
response_format={"type": "json_object"}
)
data = json.loads(response.choices[0].message.content)
return data.get("entities", [])
def _graph_retrieval(self, entities: list[str]) -> dict[str, list[EntityRelation]]:
results = {}
with self.graph.session() as session:
for entity in entities:
# Find entity and its relationships
query = """
MATCH (e)-[r]-(related)
WHERE e.name =~ $pattern OR e.label =~ $pattern
RETURN e.name as source,
type(r) as relation,
related.name as target,
properties(r) as props
LIMIT 20
"""
pattern = f"(?i).*{entity}.*"
records = session.run(query, pattern=pattern)
relations = [
EntityRelation(
source=record["source"],
relation=record["relation"],
target=record["target"],
properties=record["props"] or {}
)
for record in records
]
if relations:
results[entity] = relations
return results
def _build_context(
self,
question: str,
entities: list[str],
vector_results: list,
graph_results: dict[str, list[EntityRelation]]
) -> str:
parts = []
# Semantic documents
if vector_results:
parts.append("## Relevant Documents")
for i, doc in enumerate(vector_results, 1):
parts.append(f"{i}. {doc.page_content[:300]}...")
# Entity relationships from graph
if graph_results:
parts.append("
## Entity Knowledge Graph")
for entity, relations in graph_results.items():
parts.append(f"
### {entity}")
for rel in relations[:10]: # Limit relations
parts.append(f"- {rel.relation} -> {rel.target}")
if rel.properties:
props = ", ".join(f"{k}={v}" for k, v in rel.properties.items())
parts.append(f" ({props})")
# Extracted entities for reference
parts.append(f"
## Detected Entities: {', '.join(entities)}")
return "
".join(parts)
def _generate(self, question: str, context: str) -> str:
response = self.client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": f"""You have access to both document search results and a knowledge graph.
Use both sources to provide a comprehensive answer.
{context}"""},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
def close(self):
self.graph.close() When to Use Graph RAG
| Use Case | Why Graph RAG Helps |
|---|---|
| Multi-hop questions | Graph traversal connects related entities |
| Entity-rich domains | Structured relationships improve precision |
| Reasoning about relationships | "Who reports to whom?" needs graph structure |
| Combining structured + unstructured | Graph for facts, vectors for context |
Graph Construction
Evaluation Metrics
| Metric | What it Measures | How to Calculate |
|---|---|---|
| Answer Accuracy | Is the answer correct? | Human eval or exact match |
| Faithfulness | Is answer grounded in retrieved docs? | NLI or LLM-as-judge |
| Relevance | Are retrieved docs relevant? | Precision@K, NDCG |
| Retrieval Efficiency | How often is retrieval needed? | % queries requiring retrieval |
| Latency | Time to answer | Wall clock time |
| Hallucination Rate | Unsupported claims in answer | Manual or NLI checking |
Key metrics for RAG evaluation
Choosing an Approach
Use Basic RAG when:
- All queries need document lookup
- Simple, single-turn Q&A
- Latency is critical
Use Agentic RAG when:
- Queries vary (some need retrieval, some don't)
- Multiple retrieval sources available
- Complex multi-step reasoning needed
Use Self-RAG when:
- Accuracy is paramount
- You need to minimize hallucinations
- Quality > latency
Use Corrective RAG when:
- Retrieval quality varies
- Mixed document quality in corpus
- Web fallback is acceptable
Use Graph RAG when:
- Data has clear entity relationships
- Multi-hop reasoning required
- You have or can build a knowledge graph