LangChain, LangGraph, and LlamaIndex — Which Tool Do You Need, and When

The three most commonly referenced Python frameworks for building LLM applications — LangChain, LangGraph, and LlamaIndex — are not interchangeable, nor are they competing to solve the same problem. Each sits at a different layer of the stack: LangChain handles component wiring and model interoperability, LangGraph adds stateful workflow orchestration over that, and LlamaIndex is purpose-built for data ingestion, indexing, and retrieval. Picking the wrong one for a given problem creates friction from the first sprint; picking the right combination removes it.

A Brief Timeline

All three emerged from the same AI engineering moment: late 2022, as GPT-3.5 made LLM integration practical for working developers. LangChain launched in October 2022 as a generic LLM orchestration library. LlamaIndex (originally named GPT Index) appeared around the same time with a tighter focus on document indexing. LangGraph arrived in 2024, created by the LangChain team, specifically to address the stateful, cyclical, multi-agent patterns that LangChain's linear chain model struggled to express cleanly.

By mid-2025, LangChain on GitHub had crossed 140,000 stars; LangGraph reached 35,700 stars. That gap reflects their relationship: LangGraph is the current recommended agent runtime within the LangChain ecosystem, and LangChain increasingly serves as the integration and component layer underneath it.

LangChain — The Component Integration Layer

LangChain's original value proposition was simple: a unified interface to swap between LLM providers, vector stores, document loaders, and output parsers without rewriting application logic. That abstraction still holds, though the framework has accumulated significant surface area in the process.

The core programming model today is LCEL (LangChain Expression Language), a declarative chain composition syntax using the pipe operator:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

model = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("Summarise this in one sentence: {text}")

chain = prompt | model | StrOutputParser()

result = chain.invoke({"text": "Long document content here..."})

Each component in the pipe is a Runnable, meaning it exposes .invoke(), .batch(), .stream(), and async variants consistently. The composition model is easy to reason about for linear pipelines but breaks down once you need loops, conditional branches, or shared state across multiple agent steps — which is exactly why LangGraph exists.

What LangChain is good at

Wrapping LLM providers (OpenAI, Anthropic, Mistral, local models via Ollama) behind a common interface
Building simple RAG chains with built-in retriever integrations
Prototyping prompt pipelines quickly
Accessing LangSmith for tracing, evaluation, and observability
Providing component primitives that LangGraph nodes can consume

LangChain's known friction points

The community's frustration with LangChain has been well-documented. The framework has a history of breaking changes across versions (0.1 → 0.2 → 0.3), abstractions that conceal what is actually happening in the model call, and inconsistent internal conventions across its 700+ integrations. As of mid-2025, the LangChain team has increasingly decoupled the package and positioned langchain-core as the stable foundation. The pragmatic consensus in production environments: use LangChain for its integrations and LCEL chains, but move complex control flow to LangGraph.

LangGraph — Stateful Agentic Orchestration

LangGraph reframes an LLM workflow as a directed graph where nodes contain business logic and edges define control flow, including loops. Unlike DAG-based workflow systems, LangGraph explicitly supports cycles — which is precisely what enables agent behaviour like self-correction, reflection, and multi-step tool use.

The core abstraction is a StateGraph that carries typed state between nodes:

from typing import Annotated, TypedDict
from langchain_core.messages import BaseMessage
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    # add_messages is a reducer: new messages are appended, not replaced
    messages: Annotated[list[BaseMessage], add_messages]

def call_model(state: AgentState):
    response = model.invoke(state["messages"])
    return {"messages": [response]}

def should_continue(state: AgentState):
    last = state["messages"][-1]
    if last.tool_calls:
        return "tools"
    return END

workflow = StateGraph(AgentState)
workflow.add_node("agent", call_model)
workflow.add_conditional_edges("agent", should_continue)
workflow.set_entry_point("agent")

app = workflow.compile()

State is persistent across invocations via checkpointers (SQLite, PostgreSQL, Redis), which enables long-running workflows to be paused, inspected, and resumed. This is the mechanism that makes human-in-the-loop patterns — where a workflow stops and waits for approval before taking an action — technically straightforward rather than an engineering project.

Key LangGraph capabilities

Stateful execution: a shared TypedDict or Pydantic model flows through every node
Cyclic graphs: agents can loop until a termination condition is met
Checkpointing: thread-level and cross-session persistence built in
Human-in-the-loop: interrupt_before and interrupt_after hooks on any node
Multi-agent coordination: supervisor patterns and hierarchical subgraph composition
Streaming: token-level and node-level output streaming without extra plumbing

As of 2025, LangGraph also ships a managed deployment target (LangGraph Platform) with a free tier of 10,000 node executions per month, allowing teams to deploy graphs as production microservices without managing their own persistence infrastructure.

LlamaIndex — Data-Centric RAG and Agent Workflows

LlamaIndex (formerly GPT Index) solves a different problem entirely. Where LangChain and LangGraph are orchestration frameworks, LlamaIndex is a data framework: its primary concern is how you get unstructured and semi-structured data into a form that an LLM can usefully query.

The canonical LlamaIndex pipeline has four stages:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter

# 1. Load documents
documents = SimpleDirectoryReader("./docs").load_data()

# 2. Parse into nodes (chunks) with metadata
parser = SentenceSplitter(chunk_size=512, chunk_overlap=64)
nodes = parser.get_nodes_from_documents(documents)

# 3. Build vector index
index = VectorStoreIndex(nodes)

# 4. Query
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("What are the data retention policies?")
print(response)

Where LlamaIndex earns its production reputation is in the depth of its retrieval tooling: hybrid search combining dense vectors with BM25 keyword search, metadata filtering, rerankers, sub-question decomposition, and recursive retrieval over hierarchical document structures. These are not toy capabilities — they address the failure modes that bite every team that moves a RAG prototype into production at scale.

LlamaIndex also ships LlamaParse, a managed parsing service specifically designed for complex enterprise documents — PDFs with multi-column layouts, embedded tables, scanned pages. The 2025 updates added skew detection and improved table extraction fidelity, which matters when your knowledge base consists of financial filings or regulatory documents rather than clean Markdown files.

The agent layer in LlamaIndex is built around Workflows, an event-driven model using @step-decorated functions and a Context API rather than LangGraph's graph metaphor. LlamaIndex Workflows provides first-class support for nested agent structures and shared state, with pause-resume semantics. For teams whose primary concern is document-heavy retrieval and want their agents to treat retrieval as the first-class operation rather than a tool call among many, this model is a better fit.

Side-by-Side Comparison

Dimension	LangChain	LangGraph	LlamaIndex
Primary focus	Component wiring & LLM integration	Stateful agent orchestration	Data ingestion, indexing & retrieval
Mental model	Linear chains (pipes)	Directed cyclic graph (state machine)	Data pipeline + query engine
Agent model	Basic ReAct, Plan-and-Execute	Nodes + edges + state + cycles	Event-driven Workflows (`@step`)
RAG support	Basic (relies on integrations)	Minimal (wraps LangChain)	First-class, deep tooling
Stateful persistence	Via LangGraph	Built-in checkpointers	Workflow context API
Human-in-the-loop	Not native	Built-in interrupt hooks	Supported in Workflows
Multi-agent	Manual wiring or community libs	Supervisor/subgraph patterns	First-class routing and collaboration
Observability	LangSmith (native)	LangSmith (native)	LlamaCloud / third-party
Performance overhead	~10 ms framework overhead	~14 ms framework overhead	~6 ms framework overhead
Pricing model	MIT open-source	Free tier + paid plans	Credit-based managed platform
GitHub stars (mid-2025)	140,000+	35,700+	—

On raw framework overhead, independent benchmarks show LlamaIndex at ~6 ms and LangGraph at ~14 ms for equivalent RAG tasks. For most applications this is noise compared to model latency, but it matters in latency-sensitive workflows where you chain many calls in sequence.

When to Use Which

The question most teams ask is not "which is best" but "which fits my current problem." The following table maps problem patterns to primary recommendations:

Problem pattern	Recommended tool
Swap LLM providers without rewriting logic	LangChain
Simple linear RAG pipeline (prototype speed)	LangChain or LlamaIndex
Production RAG over large document corpora	LlamaIndex
Document parsing (PDFs, tables, scanned docs)	LlamaIndex + LlamaParse
Complex agentic loop with tool use	LangGraph
Multi-agent system with supervisor logic	LangGraph
Human approval before destructive actions	LangGraph
Long-running workflows with pause/resume	LangGraph
Document-heavy multi-agent system	LlamaIndex Workflows
Full observability + tracing	LangGraph + LangSmith
You already use Azure/AWS managed infra	LlamaIndex (AWS Bedrock integration)

The most common production pattern is a combination: LlamaIndex handles document ingestion and retrieval, and LangGraph orchestrates the agent workflow that calls the LlamaIndex query engine as one of its tools. This is not a theoretical architecture — the LlamaIndex query engine exposes a standard tool interface that LangGraph agents can invoke directly via LangChain's tool abstraction.

Reference Repositories

The following publicly available repositories contain working implementations worth reading before committing to an architecture:

langchain-ai/langchain — Core framework. The cookbook/ directory contains runnable examples covering RAG, agents, and multi-modal pipelines.
langchain-ai/langgraph — Core LangGraph repo. The examples/ directory includes supervisor multi-agent patterns, human-in-the-loop setups, and subgraph compositions.
langchain-samples — Official sample organisation with production-oriented patterns: A2A (agent-to-agent) conversations, deep agents, and travel planner architectures.
run-llama/llama_index — Core LlamaIndex repo. The docs/examples/ directory covers advanced RAG techniques, metadata extraction, and agentic Workflows.
kyrolabs/awesome-langchain — Curated list of 700+ tools and projects built on LangChain and LangGraph, useful for finding production reference implementations.
caramaschiHG/awesome-ai-agents-2026 — CC0-licensed list covering 260+ resources across the full agentic AI ecosystem, including LangChain, LangGraph, LlamaIndex, CrewAI, AutoGen, and observability tooling.

Possible Applications by Framework

The following list maps real application types to the framework best suited to drive them. Several naturally involve more than one:

LangChain-led:

Internal chatbots backed by a knowledge base (simple RAG + LangSmith tracing)
Code generation assistants with provider-agnostic model switching
Prompt versioning and A/B testing pipelines
Data extraction pipelines from structured text

LangGraph-led:

Autonomous coding agents (generate → test → fix loops)
Multi-stage compliance review workflows with human approval gates
Customer support agents with escalation logic
Research automation (plan → search → synthesise → revise)
Security incident triage with tool-calling (similar to agentic misuse patterns discussed in the OWASP Top 10 for LLM Applications)
Deep research systems (akin to OpenAI Deep Research)
Financial analysis agents with multi-source data synthesis

LlamaIndex-led:

Enterprise document Q&A over 10,000+ PDFs, contracts, or filings
Legal research assistants with citation tracking
Healthcare knowledge bases over clinical guidelines
Technical support bots backed by product documentation corpora
Multi-tenant knowledge platforms with document-level access control
Ingestion pipelines for compliance-auditable RAG (with LlamaParse + metadata)

Hybrid (LlamaIndex ingestion + LangGraph orchestration):

Autonomous report generation systems that retrieve, synthesise, and format
Security policy analysis platforms comparing documentation against standards
AI-assisted due diligence tools for M&A document review
Regulatory compliance monitoring systems ingesting regulation updates

Security Considerations

None of these frameworks are security-neutral. If you are building agentic systems with LangGraph, the human-in-the-loop hooks are not optional decoration — they are the primary mitigation for the OWASP LLM06 Excessive Agency risk. An agent with tool access to a database, an email service, or an API endpoint can do real damage under a prompt injection attack if there is no approval gate before write actions.

For LlamaIndex-based RAG systems, every document ingested into the vector store is a potential attack surface if external parties can influence what gets indexed — a variant of OWASP LLM08 (Vector and Embedding Weaknesses). Retrieval metadata should carry access control tags that are enforced at query time, not just at ingestion.

Agent identity is a separate concern. If your LangGraph agent calls a downstream service, the credential it uses should be scoped to the minimum permissions needed for that specific task — the same principle governing OAuth token scopes applies directly here. Short-lived tokens issued per-workflow-run are a better default than a long-lived service account credential embedded in configuration.

If your team is integrating any of these frameworks into production and wants a structured review of the resulting AI architecture's security posture — including prompt injection surfaces, agent privilege scope, RAG access control, and tool call boundaries — Reverse Polarity's AI Security Scan maps your implementation against the OWASP Top 10 for LLM Applications and delivers actionable findings your engineering team can prioritise and ship.