AI Engineering

Complete Guide to Using LangChain and LangGraph for RAG

Baljeet Dogra Baljeet Dogra
15 min read

Building RAG systems requires choosing the right framework. LangChain provides powerful abstractions for building RAG pipelines, while LangGraph enables complex, stateful workflows. This comprehensive guide shows you how to use both frameworks effectively for RAG applications.

LangChain vs LangGraph: Understanding the Difference

Before diving into RAG implementation, it's crucial to understand when to use LangChain versus LangGraph. Both are powerful frameworks, but they serve different purposes.

LangChain: The Foundation

LangChain is a framework for building applications with LLMs. It provides:

  • Chains: Sequential workflows (like RetrievalQA chains)
  • Components: Document loaders, text splitters, embeddings, vector stores
  • Agents: LLM-powered decision-making systems
  • Best for: Linear workflows, simple RAG pipelines, quick prototyping

LangGraph: Stateful Workflows

LangGraph extends LangChain with graph-based, stateful workflows. It provides:

  • State Management: Maintain state across multiple steps
  • Conditional Routing: Dynamic paths based on conditions
  • Cycles & Loops: Iterative workflows with feedback
  • Best for: Complex RAG workflows, multi-step reasoning, agentic systems

When to Use Each Framework

Use LangChain When:

  • • Building simple, linear RAG pipelines
  • • Quick prototyping and experimentation
  • • Straightforward retrieval → generation flow
  • • You need standard RAG components
  • • Learning RAG fundamentals

Use LangGraph When:

  • • Building complex, multi-step RAG workflows
  • • Need conditional logic and routing
  • • Implementing iterative retrieval
  • • Building agentic RAG systems
  • • Need state persistence across steps

What is RAG?

Retrieval-Augmented Generation (RAG) combines the power of information retrieval with language models. Instead of relying solely on a model's training data, RAG systems retrieve relevant documents from a knowledge base and use them as context to generate more accurate, up-to-date, and domain-specific answers.

RAG solves several critical problems:

  • Knowledge limitations: LLMs can't know everything, especially recent or proprietary information
  • Hallucinations: RAG provides factual context, reducing made-up information
  • Domain expertise: Use your own documents, databases, or knowledge bases
  • Cost efficiency: Avoid fine-tuning by using retrieval instead

The RAG Pipeline: Step by Step

A RAG system consists of six main components. Understanding each step is crucial for building effective systems:

RAG Pipeline Overview

  1. 1. Data Loading: Load documents from various sources (PDFs, text files, CSVs, etc.)
  2. 2. Document Chunking: Split documents into smaller, manageable chunks
  3. 3. Embeddings: Convert text chunks into numerical vector representations
  4. 4. Vector Stores: Store and index the embeddings for fast retrieval
  5. 5. Retrieval: Find relevant document chunks based on user queries
  6. 6. Generation: Use retrieved context with an LLM to generate answers

1. Data Loading

The first step in building a RAG system is loading your documents. LangChain provides loaders for various file formats, making it easy to ingest data from different sources.

Purpose

Load documents from various file formats so they can be processed by your RAG pipeline. LangChain supports many formats out of the box.

Common Loaders

  • PyPDFLoader: For PDF documents
  • TextLoader: For plain text files
  • CSVLoader: For CSV data files
  • DirectoryLoader: For loading multiple files from a directory
  • WebBaseLoader: For loading content from web pages

Example Code

from langchain.document_loaders import PyPDFLoader

# Load a PDF document
loader = PyPDFLoader("file.pdf")
documents = loader.load()

# Each document contains page content and metadata
print(f"Loaded {len(documents)} pages")
print(documents[0].page_content[:200])  # First 200 chars

2. Document Chunking

Documents are often too large to process efficiently. Chunking splits documents into smaller, manageable pieces that can be embedded and retrieved more accurately.

Purpose

Split documents into smaller, manageable chunks to improve retrieval accuracy. Smaller chunks are easier to embed, search, and provide more precise context to the LLM.

Common Splitter: RecursiveCharacterTextSplitter

This splitter tries to split on paragraph boundaries, then sentences, then words, ensuring chunks are semantically meaningful.

Key Parameters

  • chunk_size: Maximum size of each chunk (in characters or tokens)
  • chunk_overlap: Number of characters to overlap between chunks (maintains context)

Example Code

from langchain.text_splitter import RecursiveCharacterTextSplitter

# Create a text splitter
splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,      # Each chunk max 500 characters
    chunk_overlap=50      # 50 characters overlap between chunks
)

# Split documents into chunks
chunks = splitter.split_documents(documents)

print(f"Created {len(chunks)} chunks")
print(f"First chunk: {chunks[0].page_content[:100]}...")

Best Practice: Use chunk_size of 500-1000 characters for most use cases. Overlap of 10-20% helps maintain context across chunk boundaries.

3. Embeddings

Embeddings convert text into numerical vectors that capture semantic meaning. Similar texts have similar vectors, enabling semantic search.

Purpose

Convert text chunks into numerical vector representations that can be stored and searched. Embeddings capture semantic meaning, allowing you to find documents similar in meaning, not just exact text matches.

Popular Embeddings

  • OpenAIEmbeddings: High-quality embeddings from OpenAI (requires API key)
  • HuggingFaceEmbeddings: Free, open-source embeddings from Hugging Face
  • SentenceTransformerEmbeddings: Based on sentence transformers, good for semantic search

Example Code

from langchain.embeddings import OpenAIEmbeddings

# Initialize embeddings
embeddings = OpenAIEmbeddings()

# Generate embeddings for text
text = "What is LangChain?"
embedding = embeddings.embed_query(text)

print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

Note: OpenAIEmbeddings requires an API key. For free alternatives, use HuggingFaceEmbeddings or SentenceTransformerEmbeddings.

4. Vector Stores

Vector stores are databases optimized for storing and searching vector embeddings. They enable fast similarity search to find relevant documents.

Purpose

Store and retrieve vector embeddings efficiently. Vector stores use specialized indexing algorithms (like approximate nearest neighbor search) to find similar vectors quickly, even with millions of documents.

Popular Options

Chroma

Open-source, lightweight, easy to use. Great for prototyping and small to medium datasets.

FAISS

Facebook's library, very fast, good for large-scale applications. In-memory or disk-based.

Pinecone

Managed service, scalable, production-ready. Good for enterprise applications.

Weaviate

Open-source vector database with GraphQL API. Supports hybrid search.

Example Code

from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

# Create embeddings
embeddings = OpenAIEmbeddings()

# Create vector store from documents
vectorstore = FAISS.from_documents(
    chunks,           # Your document chunks
    embeddings        # Embedding model
)

# Save the vector store
vectorstore.save_local("faiss_index")

# Load it later
vectorstore = FAISS.load_local("faiss_index", embeddings)

5. Retrieval

Retrievers convert vector stores into searchable interfaces. They find the most relevant document chunks based on user queries.

Purpose

Convert a vector store into a retriever and fetch relevant document chunks based on user queries. Retrievers use similarity search to find the most semantically similar documents to the query.

Retrieval Strategies

  • similarity: Returns top k most similar documents (default)
  • similarity_score_threshold: Returns documents above a similarity threshold
  • mmr: Maximum Marginal Relevance - balances similarity and diversity

Example Code

# Create a retriever from vector store
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}  # Return top 3 most similar documents
)

# Retrieve relevant documents
query = "What is LangChain?"
docs = retriever.invoke(query)

print(f"Retrieved {len(docs)} documents")
for i, doc in enumerate(docs):
    print(f"\nDocument {i+1}:")
    print(doc.page_content[:200])

Tip: Start with k=3-5 documents. Too few may miss context, too many can confuse the LLM with irrelevant information.

6. RAG Pipeline

The final step combines retrieval with generation. LangChain's RetrievalQA chain handles this automatically, retrieving relevant context and passing it to the LLM.

Purpose

Combine retrieval with a language model to answer questions based on retrieved context. This is Retrieval-Augmented Generation—the LLM uses retrieved documents as context to generate accurate, grounded answers.

Component: RetrievalQA

RetrievalQA is a LangChain chain that automates the RAG process: it takes a query, retrieves relevant documents, formats them as context, and passes everything to the LLM.

Chain Types

  • stuff: Passes all retrieved documents to LLM in one go (simple, works for small contexts)
  • map_reduce: Processes each document separately, then combines results (good for large contexts)
  • refine: Iteratively refines answer by processing documents sequentially

Example Code

from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Initialize LLM
llm = OpenAI(temperature=0)

# Create RAG chain
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True  # Return source docs for citation
)

# Ask a question
result = qa.run("Explain RAG")

print(result["result"])  # The answer
print(f"\nSources: {len(result['source_documents'])} documents")

Building RAG with LangChain: Complete Example

Here's a complete example that brings all components together using LangChain:

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Step 1: Load documents
loader = PyPDFLoader("document.pdf")
documents = loader.load()

# Step 2: Split into chunks
splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)
chunks = splitter.split_documents(documents)

# Step 3: Create embeddings
embeddings = OpenAIEmbeddings()

# Step 4: Create vector store
vectorstore = FAISS.from_documents(chunks, embeddings)

# Step 5: Create retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# Step 6: Create RAG chain
llm = OpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever
)

# Query the system
answer = qa.run("What is the main topic of this document?")
print(answer)

Building Advanced RAG with LangGraph

While LangChain is perfect for simple RAG pipelines, LangGraph excels when you need complex workflows with conditional logic, state management, and iterative processes. Let's explore how to build advanced RAG systems with LangGraph.

Why Use LangGraph for RAG?

LangGraph enables RAG workflows that go beyond simple retrieval → generation:

  • Iterative Retrieval: Refine queries and retrieve multiple times based on initial results
  • Conditional Routing: Route to different retrieval strategies based on query type
  • State Persistence: Maintain conversation history and context across multiple queries
  • Multi-Step Reasoning: Break complex queries into sub-queries and combine results
  • Self-Correction: Evaluate answers and re-retrieve if quality is insufficient

LangGraph RAG Architecture

A LangGraph RAG workflow is defined as a state graph where nodes represent steps and edges define the flow. The state maintains context throughout the execution.

Basic LangGraph RAG Example

from langgraph.graph import StateGraph, END
from langchain_core.messages import HumanMessage, AIMessage
from langchain_openai import ChatOpenAI
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
import operator

# Define the state
class GraphState(TypedDict):
    messages: Annotated[list, add_messages]
    documents: list
    answer: str

# Initialize components
llm = ChatOpenAI(model="gpt-4", temperature=0)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# Define nodes
def retrieve(state: GraphState):
    """Retrieve relevant documents"""
    last_message = state["messages"][-1].content
    docs = retriever.invoke(last_message)
    return {"documents": docs}

def generate(state: GraphState):
    """Generate answer using retrieved context"""
    docs = state["documents"]
    context = "\n\n".join([doc.page_content for doc in docs])
    
    prompt = f"""Answer the question using the following context:
    
{context}

Question: {state["messages"][-1].content}
Answer:"""
    
    response = llm.invoke([HumanMessage(content=prompt)])
    return {"answer": response.content, "messages": state["messages"] + [response]}

# Build the graph
workflow = StateGraph(GraphState)
workflow.add_node("retrieve", retrieve)
workflow.add_node("generate", generate)

# Define edges
workflow.set_entry_point("retrieve")
workflow.add_edge("retrieve", "generate")
workflow.add_edge("generate", END)

# Compile and run
app = workflow.compile()
result = app.invoke({"messages": [HumanMessage(content="What is RAG?")]})
print(result["answer"])

Advanced LangGraph RAG: Iterative Retrieval

One powerful pattern is iterative retrieval—if the initial answer quality is low, refine the query and retrieve again. This requires conditional routing based on evaluation.

Iterative RAG with Quality Check

def evaluate_answer(state: GraphState):
    """Evaluate if answer quality is sufficient"""
    answer = state["answer"]
    docs = state["documents"]
    
    # Simple heuristic: check if answer mentions "I don't know" or is too short
    if "don't know" in answer.lower() or len(answer) < 50:
        return "retry"
    return "end"

def refine_query(state: GraphState):
    """Refine the query based on initial results"""
    original_query = state["messages"][-1].content
    docs = state["documents"]
    
    # Extract key terms from retrieved docs to refine query
    doc_terms = " ".join([doc.page_content[:100] for doc in docs])
    refined = f"{original_query} Context: {doc_terms[:200]}"
    
    return {"messages": state["messages"] + [HumanMessage(content=refined)]}

# Build advanced graph
workflow = StateGraph(GraphState)
workflow.add_node("retrieve", retrieve)
workflow.add_node("generate", generate)
workflow.add_node("refine_query", refine_query)

workflow.set_entry_point("retrieve")
workflow.add_edge("retrieve", "generate")

# Conditional edge based on evaluation
workflow.add_conditional_edges(
    "generate",
    evaluate_answer,
    {
        "retry": "refine_query",
        "end": END
    }
)

workflow.add_edge("refine_query", "retrieve")  # Loop back

app = workflow.compile()

Multi-Step RAG with Query Decomposition

For complex queries requiring multiple pieces of information, LangGraph can decompose the query, retrieve for each sub-query, and combine results.

Query Decomposition Pattern

def decompose_query(state: GraphState):
    """Break complex query into sub-queries"""
    query = state["messages"][-1].content
    
    decomposition_prompt = f"""Break this query into 2-3 simpler sub-queries:
Query: {query}

Return only the sub-queries, one per line."""
    
    response = llm.invoke([HumanMessage(content=decomposition_prompt)])
    sub_queries = [q.strip() for q in response.content.split("\n") if q.strip()]
    
    return {"sub_queries": sub_queries}

def retrieve_for_subquery(state: GraphState):
    """Retrieve documents for each sub-query"""
    sub_queries = state.get("sub_queries", [])
    all_docs = []
    
    for query in sub_queries:
        docs = retriever.invoke(query)
        all_docs.extend(docs)
    
    # Deduplicate
    seen = set()
    unique_docs = []
    for doc in all_docs:
        if doc.page_content not in seen:
            seen.add(doc.page_content)
            unique_docs.append(doc)
    
    return {"documents": unique_docs}

def synthesize_answer(state: GraphState):
    """Combine information from multiple retrievals"""
    docs = state["documents"]
    original_query = state["messages"][-1].content
    
    context = "\n\n".join([doc.page_content for doc in docs])
    prompt = f"""Answer this complex question by synthesizing information from multiple sources:

{context}

Question: {original_query}
Comprehensive Answer:"""
    
    response = llm.invoke([HumanMessage(content=prompt)])
    return {"answer": response.content, "messages": state["messages"] + [response]}

# Multi-step workflow
workflow = StateGraph(GraphState)
workflow.add_node("decompose", decompose_query)
workflow.add_node("retrieve_multi", retrieve_for_subquery)
workflow.add_node("synthesize", synthesize_answer)

workflow.set_entry_point("decompose")
workflow.add_edge("decompose", "retrieve_multi")
workflow.add_edge("retrieve_multi", "synthesize")
workflow.add_edge("synthesize", END)

Conditional Routing: Different Strategies for Different Queries

LangGraph allows routing to different retrieval strategies based on query characteristics. For example, factual questions might use dense retrieval, while complex reasoning might use hybrid search.

Query-Type Based Routing

def route_query(state: GraphState):
    """Determine query type and route accordingly"""
    query = state["messages"][-1].content
    
    classification_prompt = f"""Classify this query as one of:
- "factual": Simple factual question
- "complex": Requires reasoning or multiple steps
- "comparison": Comparing multiple items

Query: {query}
Type:"""
    
    response = llm.invoke([HumanMessage(content=classification_prompt)])
    query_type = response.content.strip().lower()
    
    if "factual" in query_type:
        return "dense_retrieval"
    elif "complex" in query_type or "comparison" in query_type:
        return "hybrid_retrieval"
    else:
        return "dense_retrieval"  # Default

def dense_retrieve(state: GraphState):
    """Standard semantic search"""
    query = state["messages"][-1].content
    docs = retriever.invoke(query)
    return {"documents": docs}

def hybrid_retrieve(state: GraphState):
    """Hybrid search combining semantic and keyword"""
    query = state["messages"][-1].content
    # Use MMR for diversity
    docs = retriever.invoke(query, search_type="mmr", search_kwargs={"k": 5, "fetch_k": 10})
    return {"documents": docs}

# Routing workflow
workflow = StateGraph(GraphState)
workflow.add_node("route", route_query)
workflow.add_node("dense_retrieve", dense_retrieve)
workflow.add_node("hybrid_retrieve", hybrid_retrieve)
workflow.add_node("generate", generate)

workflow.set_entry_point("route")
workflow.add_conditional_edges(
    "route",
    route_query,
    {
        "dense_retrieval": "dense_retrieve",
        "hybrid_retrieval": "hybrid_retrieve"
    }
)
workflow.add_edge("dense_retrieve", "generate")
workflow.add_edge("hybrid_retrieve", "generate")
workflow.add_edge("generate", END)

Key Differences: LangChain vs LangGraph for RAG

LangChain RAG

  • • Linear workflow: Load → Chunk → Embed → Store → Retrieve → Generate
  • • Single retrieval pass
  • • No state management between queries
  • • Simple conditional logic via chain types
  • • Best for: Standard Q&A, document search, simple chatbots

LangGraph RAG

  • • Graph-based workflow with nodes and edges
  • • Multiple retrieval passes with refinement
  • • State persists across steps
  • • Complex conditional routing and loops
  • • Best for: Multi-step reasoning, agentic systems, complex queries, iterative refinement

Best Practices for RAG Systems

1. Optimize Chunk Size

Chunk size affects retrieval quality. Too small = fragmented context. Too large = irrelevant information. Start with 500-1000 characters and adjust based on your documents.

2. Use Appropriate Overlap

Overlap between chunks (10-20%) helps maintain context across boundaries. This is especially important for documents where concepts span multiple chunks.

3. Choose the Right Embedding Model

OpenAI embeddings are high quality but cost money. HuggingFace embeddings are free and often sufficient. Test both to see what works for your use case.

4. Filter Retrieved Documents

Use similarity score thresholds to filter out irrelevant documents. Only pass highly relevant context to the LLM to improve answer quality.

5. Add Metadata Filtering

Store metadata (source, date, category) with your documents. Use metadata filters to retrieve documents from specific sources or time periods.

6. Monitor and Evaluate

Track retrieval quality, answer accuracy, and user feedback. Use evaluation metrics to continuously improve your RAG system.

Common Challenges and Solutions

Challenge: Irrelevant Retrievals

Solution: Adjust chunk size, use better embeddings, implement re-ranking, or add metadata filters.

Challenge: Context Window Limits

Solution: Use map_reduce or refine chain types, or implement a two-stage retrieval (coarse then fine).

Challenge: Stale Information

Solution: Implement periodic re-indexing, use versioned vector stores, or add timestamp-based filtering.

Challenge: Multi-hop Reasoning

Solution: Use multi-step retrieval, implement query decomposition, or use graph-based retrieval for complex relationships.

Choosing the Right Framework

The choice between LangChain and LangGraph depends on your RAG requirements:

Start with LangChain if:

  • • You're building your first RAG system
  • • Your queries are straightforward
  • • You need a simple, linear workflow
  • • You want quick prototyping
  • • Your use case is standard Q&A

Upgrade to LangGraph if:

  • • You need iterative retrieval
  • • Queries require multi-step reasoning
  • • You want conditional routing
  • • You need state management
  • • You're building agentic systems

The Bottom Line

RAG is a powerful technique for building AI applications that can answer questions using your own data. LangChain provides the essential components and simple chains for straightforward RAG pipelines, while LangGraph enables complex, stateful workflows with conditional logic and iterative processes.

The key to successful RAG systems is understanding each component and how they work together. Start with LangChain to master the fundamentals—load your documents, chunk them appropriately, create embeddings, store them in a vector database, and retrieve relevant chunks for your queries. As your requirements grow more complex, leverage LangGraph's graph-based architecture for advanced patterns like iterative retrieval, query decomposition, and conditional routing.

Whether you're building a document Q&A system, a knowledge base assistant, or a domain-specific chatbot, RAG provides a practical way to leverage LLMs with your own data. Master both frameworks, and you'll be able to build production-ready RAG systems that deliver accurate, contextually relevant answers—from simple queries to complex, multi-step reasoning tasks.

Ready to Build RAG Systems?

If you're looking to implement RAG systems for your organization or need guidance on building production-ready retrieval-augmented generation applications, I can help you design and deploy effective RAG solutions.

Get in Touch