Complete Guide to Using LangChain and LangGraph for RAG
Baljeet Dogra
Building RAG systems requires choosing the right framework. LangChain provides powerful abstractions for building RAG pipelines, while LangGraph enables complex, stateful workflows. This comprehensive guide shows you how to use both frameworks effectively for RAG applications.
LangChain vs LangGraph: Understanding the Difference
Before diving into RAG implementation, it's crucial to understand when to use LangChain versus LangGraph. Both are powerful frameworks, but they serve different purposes.
LangChain: The Foundation
LangChain is a framework for building applications with LLMs. It provides:
- Chains: Sequential workflows (like RetrievalQA chains)
- Components: Document loaders, text splitters, embeddings, vector stores
- Agents: LLM-powered decision-making systems
- Best for: Linear workflows, simple RAG pipelines, quick prototyping
LangGraph: Stateful Workflows
LangGraph extends LangChain with graph-based, stateful workflows. It provides:
- State Management: Maintain state across multiple steps
- Conditional Routing: Dynamic paths based on conditions
- Cycles & Loops: Iterative workflows with feedback
- Best for: Complex RAG workflows, multi-step reasoning, agentic systems
When to Use Each Framework
Use LangChain When:
- • Building simple, linear RAG pipelines
- • Quick prototyping and experimentation
- • Straightforward retrieval → generation flow
- • You need standard RAG components
- • Learning RAG fundamentals
Use LangGraph When:
- • Building complex, multi-step RAG workflows
- • Need conditional logic and routing
- • Implementing iterative retrieval
- • Building agentic RAG systems
- • Need state persistence across steps
What is RAG?
Retrieval-Augmented Generation (RAG) combines the power of information retrieval with language models. Instead of relying solely on a model's training data, RAG systems retrieve relevant documents from a knowledge base and use them as context to generate more accurate, up-to-date, and domain-specific answers.
RAG solves several critical problems:
- Knowledge limitations: LLMs can't know everything, especially recent or proprietary information
- Hallucinations: RAG provides factual context, reducing made-up information
- Domain expertise: Use your own documents, databases, or knowledge bases
- Cost efficiency: Avoid fine-tuning by using retrieval instead
The RAG Pipeline: Step by Step
A RAG system consists of six main components. Understanding each step is crucial for building effective systems:
RAG Pipeline Overview
- 1. Data Loading: Load documents from various sources (PDFs, text files, CSVs, etc.)
- 2. Document Chunking: Split documents into smaller, manageable chunks
- 3. Embeddings: Convert text chunks into numerical vector representations
- 4. Vector Stores: Store and index the embeddings for fast retrieval
- 5. Retrieval: Find relevant document chunks based on user queries
- 6. Generation: Use retrieved context with an LLM to generate answers
1. Data Loading
The first step in building a RAG system is loading your documents. LangChain provides loaders for various file formats, making it easy to ingest data from different sources.
Purpose
Load documents from various file formats so they can be processed by your RAG pipeline. LangChain supports many formats out of the box.
Common Loaders
- PyPDFLoader: For PDF documents
- TextLoader: For plain text files
- CSVLoader: For CSV data files
- DirectoryLoader: For loading multiple files from a directory
- WebBaseLoader: For loading content from web pages
Example Code
from langchain.document_loaders import PyPDFLoader
# Load a PDF document
loader = PyPDFLoader("file.pdf")
documents = loader.load()
# Each document contains page content and metadata
print(f"Loaded {len(documents)} pages")
print(documents[0].page_content[:200]) # First 200 chars
2. Document Chunking
Documents are often too large to process efficiently. Chunking splits documents into smaller, manageable pieces that can be embedded and retrieved more accurately.
Purpose
Split documents into smaller, manageable chunks to improve retrieval accuracy. Smaller chunks are easier to embed, search, and provide more precise context to the LLM.
Common Splitter: RecursiveCharacterTextSplitter
This splitter tries to split on paragraph boundaries, then sentences, then words, ensuring chunks are semantically meaningful.
Key Parameters
- chunk_size: Maximum size of each chunk (in characters or tokens)
- chunk_overlap: Number of characters to overlap between chunks (maintains context)
Example Code
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Create a text splitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=500, # Each chunk max 500 characters
chunk_overlap=50 # 50 characters overlap between chunks
)
# Split documents into chunks
chunks = splitter.split_documents(documents)
print(f"Created {len(chunks)} chunks")
print(f"First chunk: {chunks[0].page_content[:100]}...")
Best Practice: Use chunk_size of 500-1000 characters for most use cases. Overlap of 10-20% helps maintain context across chunk boundaries.
3. Embeddings
Embeddings convert text into numerical vectors that capture semantic meaning. Similar texts have similar vectors, enabling semantic search.
Purpose
Convert text chunks into numerical vector representations that can be stored and searched. Embeddings capture semantic meaning, allowing you to find documents similar in meaning, not just exact text matches.
Popular Embeddings
- OpenAIEmbeddings: High-quality embeddings from OpenAI (requires API key)
- HuggingFaceEmbeddings: Free, open-source embeddings from Hugging Face
- SentenceTransformerEmbeddings: Based on sentence transformers, good for semantic search
Example Code
from langchain.embeddings import OpenAIEmbeddings
# Initialize embeddings
embeddings = OpenAIEmbeddings()
# Generate embeddings for text
text = "What is LangChain?"
embedding = embeddings.embed_query(text)
print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")
Note: OpenAIEmbeddings requires an API key. For free alternatives, use HuggingFaceEmbeddings or SentenceTransformerEmbeddings.
4. Vector Stores
Vector stores are databases optimized for storing and searching vector embeddings. They enable fast similarity search to find relevant documents.
Purpose
Store and retrieve vector embeddings efficiently. Vector stores use specialized indexing algorithms (like approximate nearest neighbor search) to find similar vectors quickly, even with millions of documents.
Popular Options
Chroma
Open-source, lightweight, easy to use. Great for prototyping and small to medium datasets.
FAISS
Facebook's library, very fast, good for large-scale applications. In-memory or disk-based.
Pinecone
Managed service, scalable, production-ready. Good for enterprise applications.
Weaviate
Open-source vector database with GraphQL API. Supports hybrid search.
Example Code
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
# Create embeddings
embeddings = OpenAIEmbeddings()
# Create vector store from documents
vectorstore = FAISS.from_documents(
chunks, # Your document chunks
embeddings # Embedding model
)
# Save the vector store
vectorstore.save_local("faiss_index")
# Load it later
vectorstore = FAISS.load_local("faiss_index", embeddings)
5. Retrieval
Retrievers convert vector stores into searchable interfaces. They find the most relevant document chunks based on user queries.
Purpose
Convert a vector store into a retriever and fetch relevant document chunks based on user queries. Retrievers use similarity search to find the most semantically similar documents to the query.
Retrieval Strategies
- similarity: Returns top k most similar documents (default)
- similarity_score_threshold: Returns documents above a similarity threshold
- mmr: Maximum Marginal Relevance - balances similarity and diversity
Example Code
# Create a retriever from vector store
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 3} # Return top 3 most similar documents
)
# Retrieve relevant documents
query = "What is LangChain?"
docs = retriever.invoke(query)
print(f"Retrieved {len(docs)} documents")
for i, doc in enumerate(docs):
print(f"\nDocument {i+1}:")
print(doc.page_content[:200])
Tip: Start with k=3-5 documents. Too few may miss context, too many can confuse the LLM with irrelevant information.
6. RAG Pipeline
The final step combines retrieval with generation. LangChain's RetrievalQA chain handles this automatically, retrieving relevant context and passing it to the LLM.
Purpose
Combine retrieval with a language model to answer questions based on retrieved context. This is Retrieval-Augmented Generation—the LLM uses retrieved documents as context to generate accurate, grounded answers.
Component: RetrievalQA
RetrievalQA is a LangChain chain that automates the RAG process: it takes a query, retrieves relevant documents, formats them as context, and passes everything to the LLM.
Chain Types
- stuff: Passes all retrieved documents to LLM in one go (simple, works for small contexts)
- map_reduce: Processes each document separately, then combines results (good for large contexts)
- refine: Iteratively refines answer by processing documents sequentially
Example Code
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
# Initialize LLM
llm = OpenAI(temperature=0)
# Create RAG chain
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
return_source_documents=True # Return source docs for citation
)
# Ask a question
result = qa.run("Explain RAG")
print(result["result"]) # The answer
print(f"\nSources: {len(result['source_documents'])} documents")
Building RAG with LangChain: Complete Example
Here's a complete example that brings all components together using LangChain:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
# Step 1: Load documents
loader = PyPDFLoader("document.pdf")
documents = loader.load()
# Step 2: Split into chunks
splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50
)
chunks = splitter.split_documents(documents)
# Step 3: Create embeddings
embeddings = OpenAIEmbeddings()
# Step 4: Create vector store
vectorstore = FAISS.from_documents(chunks, embeddings)
# Step 5: Create retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
# Step 6: Create RAG chain
llm = OpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever
)
# Query the system
answer = qa.run("What is the main topic of this document?")
print(answer)
Building Advanced RAG with LangGraph
While LangChain is perfect for simple RAG pipelines, LangGraph excels when you need complex workflows with conditional logic, state management, and iterative processes. Let's explore how to build advanced RAG systems with LangGraph.
Why Use LangGraph for RAG?
LangGraph enables RAG workflows that go beyond simple retrieval → generation:
- Iterative Retrieval: Refine queries and retrieve multiple times based on initial results
- Conditional Routing: Route to different retrieval strategies based on query type
- State Persistence: Maintain conversation history and context across multiple queries
- Multi-Step Reasoning: Break complex queries into sub-queries and combine results
- Self-Correction: Evaluate answers and re-retrieve if quality is insufficient
LangGraph RAG Architecture
A LangGraph RAG workflow is defined as a state graph where nodes represent steps and edges define the flow. The state maintains context throughout the execution.
Basic LangGraph RAG Example
from langgraph.graph import StateGraph, END
from langchain_core.messages import HumanMessage, AIMessage
from langchain_openai import ChatOpenAI
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
import operator
# Define the state
class GraphState(TypedDict):
messages: Annotated[list, add_messages]
documents: list
answer: str
# Initialize components
llm = ChatOpenAI(model="gpt-4", temperature=0)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
# Define nodes
def retrieve(state: GraphState):
"""Retrieve relevant documents"""
last_message = state["messages"][-1].content
docs = retriever.invoke(last_message)
return {"documents": docs}
def generate(state: GraphState):
"""Generate answer using retrieved context"""
docs = state["documents"]
context = "\n\n".join([doc.page_content for doc in docs])
prompt = f"""Answer the question using the following context:
{context}
Question: {state["messages"][-1].content}
Answer:"""
response = llm.invoke([HumanMessage(content=prompt)])
return {"answer": response.content, "messages": state["messages"] + [response]}
# Build the graph
workflow = StateGraph(GraphState)
workflow.add_node("retrieve", retrieve)
workflow.add_node("generate", generate)
# Define edges
workflow.set_entry_point("retrieve")
workflow.add_edge("retrieve", "generate")
workflow.add_edge("generate", END)
# Compile and run
app = workflow.compile()
result = app.invoke({"messages": [HumanMessage(content="What is RAG?")]})
print(result["answer"])
Advanced LangGraph RAG: Iterative Retrieval
One powerful pattern is iterative retrieval—if the initial answer quality is low, refine the query and retrieve again. This requires conditional routing based on evaluation.
Iterative RAG with Quality Check
def evaluate_answer(state: GraphState):
"""Evaluate if answer quality is sufficient"""
answer = state["answer"]
docs = state["documents"]
# Simple heuristic: check if answer mentions "I don't know" or is too short
if "don't know" in answer.lower() or len(answer) < 50:
return "retry"
return "end"
def refine_query(state: GraphState):
"""Refine the query based on initial results"""
original_query = state["messages"][-1].content
docs = state["documents"]
# Extract key terms from retrieved docs to refine query
doc_terms = " ".join([doc.page_content[:100] for doc in docs])
refined = f"{original_query} Context: {doc_terms[:200]}"
return {"messages": state["messages"] + [HumanMessage(content=refined)]}
# Build advanced graph
workflow = StateGraph(GraphState)
workflow.add_node("retrieve", retrieve)
workflow.add_node("generate", generate)
workflow.add_node("refine_query", refine_query)
workflow.set_entry_point("retrieve")
workflow.add_edge("retrieve", "generate")
# Conditional edge based on evaluation
workflow.add_conditional_edges(
"generate",
evaluate_answer,
{
"retry": "refine_query",
"end": END
}
)
workflow.add_edge("refine_query", "retrieve") # Loop back
app = workflow.compile()
Multi-Step RAG with Query Decomposition
For complex queries requiring multiple pieces of information, LangGraph can decompose the query, retrieve for each sub-query, and combine results.
Query Decomposition Pattern
def decompose_query(state: GraphState):
"""Break complex query into sub-queries"""
query = state["messages"][-1].content
decomposition_prompt = f"""Break this query into 2-3 simpler sub-queries:
Query: {query}
Return only the sub-queries, one per line."""
response = llm.invoke([HumanMessage(content=decomposition_prompt)])
sub_queries = [q.strip() for q in response.content.split("\n") if q.strip()]
return {"sub_queries": sub_queries}
def retrieve_for_subquery(state: GraphState):
"""Retrieve documents for each sub-query"""
sub_queries = state.get("sub_queries", [])
all_docs = []
for query in sub_queries:
docs = retriever.invoke(query)
all_docs.extend(docs)
# Deduplicate
seen = set()
unique_docs = []
for doc in all_docs:
if doc.page_content not in seen:
seen.add(doc.page_content)
unique_docs.append(doc)
return {"documents": unique_docs}
def synthesize_answer(state: GraphState):
"""Combine information from multiple retrievals"""
docs = state["documents"]
original_query = state["messages"][-1].content
context = "\n\n".join([doc.page_content for doc in docs])
prompt = f"""Answer this complex question by synthesizing information from multiple sources:
{context}
Question: {original_query}
Comprehensive Answer:"""
response = llm.invoke([HumanMessage(content=prompt)])
return {"answer": response.content, "messages": state["messages"] + [response]}
# Multi-step workflow
workflow = StateGraph(GraphState)
workflow.add_node("decompose", decompose_query)
workflow.add_node("retrieve_multi", retrieve_for_subquery)
workflow.add_node("synthesize", synthesize_answer)
workflow.set_entry_point("decompose")
workflow.add_edge("decompose", "retrieve_multi")
workflow.add_edge("retrieve_multi", "synthesize")
workflow.add_edge("synthesize", END)
Conditional Routing: Different Strategies for Different Queries
LangGraph allows routing to different retrieval strategies based on query characteristics. For example, factual questions might use dense retrieval, while complex reasoning might use hybrid search.
Query-Type Based Routing
def route_query(state: GraphState):
"""Determine query type and route accordingly"""
query = state["messages"][-1].content
classification_prompt = f"""Classify this query as one of:
- "factual": Simple factual question
- "complex": Requires reasoning or multiple steps
- "comparison": Comparing multiple items
Query: {query}
Type:"""
response = llm.invoke([HumanMessage(content=classification_prompt)])
query_type = response.content.strip().lower()
if "factual" in query_type:
return "dense_retrieval"
elif "complex" in query_type or "comparison" in query_type:
return "hybrid_retrieval"
else:
return "dense_retrieval" # Default
def dense_retrieve(state: GraphState):
"""Standard semantic search"""
query = state["messages"][-1].content
docs = retriever.invoke(query)
return {"documents": docs}
def hybrid_retrieve(state: GraphState):
"""Hybrid search combining semantic and keyword"""
query = state["messages"][-1].content
# Use MMR for diversity
docs = retriever.invoke(query, search_type="mmr", search_kwargs={"k": 5, "fetch_k": 10})
return {"documents": docs}
# Routing workflow
workflow = StateGraph(GraphState)
workflow.add_node("route", route_query)
workflow.add_node("dense_retrieve", dense_retrieve)
workflow.add_node("hybrid_retrieve", hybrid_retrieve)
workflow.add_node("generate", generate)
workflow.set_entry_point("route")
workflow.add_conditional_edges(
"route",
route_query,
{
"dense_retrieval": "dense_retrieve",
"hybrid_retrieval": "hybrid_retrieve"
}
)
workflow.add_edge("dense_retrieve", "generate")
workflow.add_edge("hybrid_retrieve", "generate")
workflow.add_edge("generate", END)
Key Differences: LangChain vs LangGraph for RAG
LangChain RAG
- • Linear workflow: Load → Chunk → Embed → Store → Retrieve → Generate
- • Single retrieval pass
- • No state management between queries
- • Simple conditional logic via chain types
- • Best for: Standard Q&A, document search, simple chatbots
LangGraph RAG
- • Graph-based workflow with nodes and edges
- • Multiple retrieval passes with refinement
- • State persists across steps
- • Complex conditional routing and loops
- • Best for: Multi-step reasoning, agentic systems, complex queries, iterative refinement
Best Practices for RAG Systems
1. Optimize Chunk Size
Chunk size affects retrieval quality. Too small = fragmented context. Too large = irrelevant information. Start with 500-1000 characters and adjust based on your documents.
2. Use Appropriate Overlap
Overlap between chunks (10-20%) helps maintain context across boundaries. This is especially important for documents where concepts span multiple chunks.
3. Choose the Right Embedding Model
OpenAI embeddings are high quality but cost money. HuggingFace embeddings are free and often sufficient. Test both to see what works for your use case.
4. Filter Retrieved Documents
Use similarity score thresholds to filter out irrelevant documents. Only pass highly relevant context to the LLM to improve answer quality.
5. Add Metadata Filtering
Store metadata (source, date, category) with your documents. Use metadata filters to retrieve documents from specific sources or time periods.
6. Monitor and Evaluate
Track retrieval quality, answer accuracy, and user feedback. Use evaluation metrics to continuously improve your RAG system.
Common Challenges and Solutions
Challenge: Irrelevant Retrievals
Solution: Adjust chunk size, use better embeddings, implement re-ranking, or add metadata filters.
Challenge: Context Window Limits
Solution: Use map_reduce or refine chain types, or implement a two-stage retrieval (coarse then fine).
Challenge: Stale Information
Solution: Implement periodic re-indexing, use versioned vector stores, or add timestamp-based filtering.
Challenge: Multi-hop Reasoning
Solution: Use multi-step retrieval, implement query decomposition, or use graph-based retrieval for complex relationships.
Choosing the Right Framework
The choice between LangChain and LangGraph depends on your RAG requirements:
Start with LangChain if:
- • You're building your first RAG system
- • Your queries are straightforward
- • You need a simple, linear workflow
- • You want quick prototyping
- • Your use case is standard Q&A
Upgrade to LangGraph if:
- • You need iterative retrieval
- • Queries require multi-step reasoning
- • You want conditional routing
- • You need state management
- • You're building agentic systems
The Bottom Line
RAG is a powerful technique for building AI applications that can answer questions using your own data. LangChain provides the essential components and simple chains for straightforward RAG pipelines, while LangGraph enables complex, stateful workflows with conditional logic and iterative processes.
The key to successful RAG systems is understanding each component and how they work together. Start with LangChain to master the fundamentals—load your documents, chunk them appropriately, create embeddings, store them in a vector database, and retrieve relevant chunks for your queries. As your requirements grow more complex, leverage LangGraph's graph-based architecture for advanced patterns like iterative retrieval, query decomposition, and conditional routing.
Whether you're building a document Q&A system, a knowledge base assistant, or a domain-specific chatbot, RAG provides a practical way to leverage LLMs with your own data. Master both frameworks, and you'll be able to build production-ready RAG systems that deliver accurate, contextually relevant answers—from simple queries to complex, multi-step reasoning tasks.
Ready to Build RAG Systems?
If you're looking to implement RAG systems for your organization or need guidance on building production-ready retrieval-augmented generation applications, I can help you design and deploy effective RAG solutions.
Get in Touch