LangChain, LangGraph, and LlamaIndex — Which Tool Do You Need, and When
    AI Engineering

    LangChain, LangGraph, and LlamaIndex — Which Tool Do You Need, and When

    Three dominant frameworks for building LLM-powered applications, each with a different abstraction model and production sweet spot — here is how to read the map.

    Reverse PolarityJune 24, 202610 min read

    The three most commonly referenced Python frameworks for building LLM applications — LangChain, LangGraph, and LlamaIndex — are not interchangeable, nor are they competing to solve the same problem. Each sits at a different layer of the stack: LangChain handles component wiring and model interoperability, LangGraph adds stateful workflow orchestration over that, and LlamaIndex is purpose-built for data ingestion, indexing, and retrieval. Picking the wrong one for a given problem creates friction from the first sprint; picking the right combination removes it.

    A Brief Timeline

    All three emerged from the same AI engineering moment: late 2022, as GPT-3.5 made LLM integration practical for working developers. LangChain launched in October 2022 as a generic LLM orchestration library. LlamaIndex (originally named GPT Index) appeared around the same time with a tighter focus on document indexing. LangGraph arrived in 2024, created by the LangChain team, specifically to address the stateful, cyclical, multi-agent patterns that LangChain's linear chain model struggled to express cleanly.

    By mid-2025, LangChain on GitHub had crossed 140,000 stars; LangGraph reached 35,700 stars. That gap reflects their relationship: LangGraph is the current recommended agent runtime within the LangChain ecosystem, and LangChain increasingly serves as the integration and component layer underneath it.


    LangChain — The Component Integration Layer

    LangChain's original value proposition was simple: a unified interface to swap between LLM providers, vector stores, document loaders, and output parsers without rewriting application logic. That abstraction still holds, though the framework has accumulated significant surface area in the process.

    The core programming model today is LCEL (LangChain Expression Language), a declarative chain composition syntax using the pipe operator:

    from langchain_openai import ChatOpenAI
    from langchain_core.prompts import ChatPromptTemplate
    from langchain_core.output_parsers import StrOutputParser
    
    model = ChatOpenAI(model="gpt-4o")
    prompt = ChatPromptTemplate.from_template("Summarise this in one sentence: {text}")
    
    chain = prompt | model | StrOutputParser()
    
    result = chain.invoke({"text": "Long document content here..."})
    

    Each component in the pipe is a Runnable, meaning it exposes .invoke(), .batch(), .stream(), and async variants consistently. The composition model is easy to reason about for linear pipelines but breaks down once you need loops, conditional branches, or shared state across multiple agent steps — which is exactly why LangGraph exists.

    What LangChain is good at

    • Wrapping LLM providers (OpenAI, Anthropic, Mistral, local models via Ollama) behind a common interface
    • Building simple RAG chains with built-in retriever integrations
    • Prototyping prompt pipelines quickly
    • Accessing LangSmith for tracing, evaluation, and observability
    • Providing component primitives that LangGraph nodes can consume

    LangChain's known friction points

    The community's frustration with LangChain has been well-documented. The framework has a history of breaking changes across versions (0.1 → 0.2 → 0.3), abstractions that conceal what is actually happening in the model call, and inconsistent internal conventions across its 700+ integrations. As of mid-2025, the LangChain team has increasingly decoupled the package and positioned langchain-core as the stable foundation. The pragmatic consensus in production environments: use LangChain for its integrations and LCEL chains, but move complex control flow to LangGraph.


    LangGraph — Stateful Agentic Orchestration

    LangGraph reframes an LLM workflow as a directed graph where nodes contain business logic and edges define control flow, including loops. Unlike DAG-based workflow systems, LangGraph explicitly supports cycles — which is precisely what enables agent behaviour like self-correction, reflection, and multi-step tool use.

    The core abstraction is a StateGraph that carries typed state between nodes:

    from typing import Annotated, TypedDict
    from langchain_core.messages import BaseMessage
    from langgraph.graph import StateGraph, END
    from langgraph.graph.message import add_messages
    
    class AgentState(TypedDict):
        # add_messages is a reducer: new messages are appended, not replaced
        messages: Annotated[list[BaseMessage], add_messages]
    
    def call_model(state: AgentState):
        response = model.invoke(state["messages"])
        return {"messages": [response]}
    
    def should_continue(state: AgentState):
        last = state["messages"][-1]
        if last.tool_calls:
            return "tools"
        return END
    
    workflow = StateGraph(AgentState)
    workflow.add_node("agent", call_model)
    workflow.add_conditional_edges("agent", should_continue)
    workflow.set_entry_point("agent")
    
    app = workflow.compile()
    

    State is persistent across invocations via checkpointers (SQLite, PostgreSQL, Redis), which enables long-running workflows to be paused, inspected, and resumed. This is the mechanism that makes human-in-the-loop patterns — where a workflow stops and waits for approval before taking an action — technically straightforward rather than an engineering project.

    Key LangGraph capabilities

    • Stateful execution: a shared TypedDict or Pydantic model flows through every node
    • Cyclic graphs: agents can loop until a termination condition is met
    • Checkpointing: thread-level and cross-session persistence built in
    • Human-in-the-loop: interrupt_before and interrupt_after hooks on any node
    • Multi-agent coordination: supervisor patterns and hierarchical subgraph composition
    • Streaming: token-level and node-level output streaming without extra plumbing

    As of 2025, LangGraph also ships a managed deployment target (LangGraph Platform) with a free tier of 10,000 node executions per month, allowing teams to deploy graphs as production microservices without managing their own persistence infrastructure.


    LlamaIndex — Data-Centric RAG and Agent Workflows

    LlamaIndex (formerly GPT Index) solves a different problem entirely. Where LangChain and LangGraph are orchestration frameworks, LlamaIndex is a data framework: its primary concern is how you get unstructured and semi-structured data into a form that an LLM can usefully query.

    The canonical LlamaIndex pipeline has four stages:

    from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
    from llama_index.core.node_parser import SentenceSplitter
    
    # 1. Load documents
    documents = SimpleDirectoryReader("./docs").load_data()
    
    # 2. Parse into nodes (chunks) with metadata
    parser = SentenceSplitter(chunk_size=512, chunk_overlap=64)
    nodes = parser.get_nodes_from_documents(documents)
    
    # 3. Build vector index
    index = VectorStoreIndex(nodes)
    
    # 4. Query
    query_engine = index.as_query_engine(similarity_top_k=5)
    response = query_engine.query("What are the data retention policies?")
    print(response)
    

    Where LlamaIndex earns its production reputation is in the depth of its retrieval tooling: hybrid search combining dense vectors with BM25 keyword search, metadata filtering, rerankers, sub-question decomposition, and recursive retrieval over hierarchical document structures. These are not toy capabilities — they address the failure modes that bite every team that moves a RAG prototype into production at scale.

    LlamaIndex also ships LlamaParse, a managed parsing service specifically designed for complex enterprise documents — PDFs with multi-column layouts, embedded tables, scanned pages. The 2025 updates added skew detection and improved table extraction fidelity, which matters when your knowledge base consists of financial filings or regulatory documents rather than clean Markdown files.

    The agent layer in LlamaIndex is built around Workflows, an event-driven model using @step-decorated functions and a Context API rather than LangGraph's graph metaphor. LlamaIndex Workflows provides first-class support for nested agent structures and shared state, with pause-resume semantics. For teams whose primary concern is document-heavy retrieval and want their agents to treat retrieval as the first-class operation rather than a tool call among many, this model is a better fit.


    Side-by-Side Comparison

    Dimension LangChain LangGraph LlamaIndex
    Primary focus Component wiring & LLM integration Stateful agent orchestration Data ingestion, indexing & retrieval
    Mental model Linear chains (pipes) Directed cyclic graph (state machine) Data pipeline + query engine
    Agent model Basic ReAct, Plan-and-Execute Nodes + edges + state + cycles Event-driven Workflows (@step)
    RAG support Basic (relies on integrations) Minimal (wraps LangChain) First-class, deep tooling
    Stateful persistence Via LangGraph Built-in checkpointers Workflow context API
    Human-in-the-loop Not native Built-in interrupt hooks Supported in Workflows
    Multi-agent Manual wiring or community libs Supervisor/subgraph patterns First-class routing and collaboration
    Observability LangSmith (native) LangSmith (native) LlamaCloud / third-party
    Performance overhead ~10 ms framework overhead ~14 ms framework overhead ~6 ms framework overhead
    Pricing model MIT open-source Free tier + paid plans Credit-based managed platform
    GitHub stars (mid-2025) 140,000+ 35,700+

    On raw framework overhead, independent benchmarks show LlamaIndex at ~6 ms and LangGraph at ~14 ms for equivalent RAG tasks. For most applications this is noise compared to model latency, but it matters in latency-sensitive workflows where you chain many calls in sequence.


    When to Use Which

    The question most teams ask is not "which is best" but "which fits my current problem." The following table maps problem patterns to primary recommendations:

    Problem pattern Recommended tool
    Swap LLM providers without rewriting logic LangChain
    Simple linear RAG pipeline (prototype speed) LangChain or LlamaIndex
    Production RAG over large document corpora LlamaIndex
    Document parsing (PDFs, tables, scanned docs) LlamaIndex + LlamaParse
    Complex agentic loop with tool use LangGraph
    Multi-agent system with supervisor logic LangGraph
    Human approval before destructive actions LangGraph
    Long-running workflows with pause/resume LangGraph
    Document-heavy multi-agent system LlamaIndex Workflows
    Full observability + tracing LangGraph + LangSmith
    You already use Azure/AWS managed infra LlamaIndex (AWS Bedrock integration)

    The most common production pattern is a combination: LlamaIndex handles document ingestion and retrieval, and LangGraph orchestrates the agent workflow that calls the LlamaIndex query engine as one of its tools. This is not a theoretical architecture — the LlamaIndex query engine exposes a standard tool interface that LangGraph agents can invoke directly via LangChain's tool abstraction.


    Reference Repositories

    The following publicly available repositories contain working implementations worth reading before committing to an architecture:

    • langchain-ai/langchain — Core framework. The cookbook/ directory contains runnable examples covering RAG, agents, and multi-modal pipelines.
    • langchain-ai/langgraph — Core LangGraph repo. The examples/ directory includes supervisor multi-agent patterns, human-in-the-loop setups, and subgraph compositions.
    • langchain-samples — Official sample organisation with production-oriented patterns: A2A (agent-to-agent) conversations, deep agents, and travel planner architectures.
    • run-llama/llama_index — Core LlamaIndex repo. The docs/examples/ directory covers advanced RAG techniques, metadata extraction, and agentic Workflows.
    • kyrolabs/awesome-langchain — Curated list of 700+ tools and projects built on LangChain and LangGraph, useful for finding production reference implementations.
    • caramaschiHG/awesome-ai-agents-2026 — CC0-licensed list covering 260+ resources across the full agentic AI ecosystem, including LangChain, LangGraph, LlamaIndex, CrewAI, AutoGen, and observability tooling.

    Possible Applications by Framework

    The following list maps real application types to the framework best suited to drive them. Several naturally involve more than one:

    LangChain-led:

    • Internal chatbots backed by a knowledge base (simple RAG + LangSmith tracing)
    • Code generation assistants with provider-agnostic model switching
    • Prompt versioning and A/B testing pipelines
    • Data extraction pipelines from structured text

    LangGraph-led:

    • Autonomous coding agents (generate → test → fix loops)
    • Multi-stage compliance review workflows with human approval gates
    • Customer support agents with escalation logic
    • Research automation (plan → search → synthesise → revise)
    • Security incident triage with tool-calling (similar to agentic misuse patterns discussed in the OWASP Top 10 for LLM Applications)
    • Deep research systems (akin to OpenAI Deep Research)
    • Financial analysis agents with multi-source data synthesis

    LlamaIndex-led:

    • Enterprise document Q&A over 10,000+ PDFs, contracts, or filings
    • Legal research assistants with citation tracking
    • Healthcare knowledge bases over clinical guidelines
    • Technical support bots backed by product documentation corpora
    • Multi-tenant knowledge platforms with document-level access control
    • Ingestion pipelines for compliance-auditable RAG (with LlamaParse + metadata)

    Hybrid (LlamaIndex ingestion + LangGraph orchestration):

    • Autonomous report generation systems that retrieve, synthesise, and format
    • Security policy analysis platforms comparing documentation against standards
    • AI-assisted due diligence tools for M&A document review
    • Regulatory compliance monitoring systems ingesting regulation updates

    Security Considerations

    None of these frameworks are security-neutral. If you are building agentic systems with LangGraph, the human-in-the-loop hooks are not optional decoration — they are the primary mitigation for the OWASP LLM06 Excessive Agency risk. An agent with tool access to a database, an email service, or an API endpoint can do real damage under a prompt injection attack if there is no approval gate before write actions.

    For LlamaIndex-based RAG systems, every document ingested into the vector store is a potential attack surface if external parties can influence what gets indexed — a variant of OWASP LLM08 (Vector and Embedding Weaknesses). Retrieval metadata should carry access control tags that are enforced at query time, not just at ingestion.

    Agent identity is a separate concern. If your LangGraph agent calls a downstream service, the credential it uses should be scoped to the minimum permissions needed for that specific task — the same principle governing OAuth token scopes applies directly here. Short-lived tokens issued per-workflow-run are a better default than a long-lived service account credential embedded in configuration.


    If your team is integrating any of these frameworks into production and wants a structured review of the resulting AI architecture's security posture — including prompt injection surfaces, agent privilege scope, RAG access control, and tool call boundaries — Reverse Polarity's AI Security Scan maps your implementation against the OWASP Top 10 for LLM Applications and delivers actionable findings your engineering team can prioritise and ship.

    Sources

    More Articles