The three most commonly referenced Python frameworks for building LLM applications — LangChain, LangGraph, and LlamaIndex — are not interchangeable, nor are they competing to solve the same problem. Each sits at a different layer of the stack: LangChain handles component wiring and model interoperability, LangGraph adds stateful workflow orchestration over that, and LlamaIndex is purpose-built for data ingestion, indexing, and retrieval. Picking the wrong one for a given problem creates friction from the first sprint; picking the right combination removes it.
A Brief Timeline
All three emerged from the same AI engineering moment: late 2022, as GPT-3.5 made LLM integration practical for working developers. LangChain launched in October 2022 as a generic LLM orchestration library. LlamaIndex (originally named GPT Index) appeared around the same time with a tighter focus on document indexing. LangGraph arrived in 2024, created by the LangChain team, specifically to address the stateful, cyclical, multi-agent patterns that LangChain's linear chain model struggled to express cleanly.
By mid-2025, LangChain on GitHub had crossed 140,000 stars; LangGraph reached 35,700 stars. That gap reflects their relationship: LangGraph is the current recommended agent runtime within the LangChain ecosystem, and LangChain increasingly serves as the integration and component layer underneath it.
LangChain — The Component Integration Layer
LangChain's original value proposition was simple: a unified interface to swap between LLM providers, vector stores, document loaders, and output parsers without rewriting application logic. That abstraction still holds, though the framework has accumulated significant surface area in the process.
The core programming model today is LCEL (LangChain Expression Language), a declarative chain composition syntax using the pipe operator:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
model = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("Summarise this in one sentence: {text}")
chain = prompt | model | StrOutputParser()
result = chain.invoke({"text": "Long document content here..."})
Each component in the pipe is a Runnable, meaning it exposes .invoke(), .batch(), .stream(), and async variants consistently. The composition model is easy to reason about for linear pipelines but breaks down once you need loops, conditional branches, or shared state across multiple agent steps — which is exactly why LangGraph exists.
What LangChain is good at
- Wrapping LLM providers (OpenAI, Anthropic, Mistral, local models via Ollama) behind a common interface
- Building simple RAG chains with built-in retriever integrations
- Prototyping prompt pipelines quickly
- Accessing LangSmith for tracing, evaluation, and observability
- Providing component primitives that LangGraph nodes can consume
LangChain's known friction points
The community's frustration with LangChain has been well-documented. The framework has a history of breaking changes across versions (0.1 → 0.2 → 0.3), abstractions that conceal what is actually happening in the model call, and inconsistent internal conventions across its 700+ integrations. As of mid-2025, the LangChain team has increasingly decoupled the package and positioned langchain-core as the stable foundation. The pragmatic consensus in production environments: use LangChain for its integrations and LCEL chains, but move complex control flow to LangGraph.
LangGraph — Stateful Agentic Orchestration
LangGraph reframes an LLM workflow as a directed graph where nodes contain business logic and edges define control flow, including loops. Unlike DAG-based workflow systems, LangGraph explicitly supports cycles — which is precisely what enables agent behaviour like self-correction, reflection, and multi-step tool use.
The core abstraction is a StateGraph that carries typed state between nodes:
from typing import Annotated, TypedDict
from langchain_core.messages import BaseMessage
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
# add_messages is a reducer: new messages are appended, not replaced
messages: Annotated[list[BaseMessage], add_messages]
def call_model(state: AgentState):
response = model.invoke(state["messages"])
return {"messages": [response]}
def should_continue(state: AgentState):
last = state["messages"][-1]
if last.tool_calls:
return "tools"
return END
workflow = StateGraph(AgentState)
workflow.add_node("agent", call_model)
workflow.add_conditional_edges("agent", should_continue)
workflow.set_entry_point("agent")
app = workflow.compile()
State is persistent across invocations via checkpointers (SQLite, PostgreSQL, Redis), which enables long-running workflows to be paused, inspected, and resumed. This is the mechanism that makes human-in-the-loop patterns — where a workflow stops and waits for approval before taking an action — technically straightforward rather than an engineering project.
Key LangGraph capabilities
- Stateful execution: a shared TypedDict or Pydantic model flows through every node
- Cyclic graphs: agents can loop until a termination condition is met
- Checkpointing: thread-level and cross-session persistence built in
- Human-in-the-loop:
interrupt_beforeandinterrupt_afterhooks on any node - Multi-agent coordination: supervisor patterns and hierarchical subgraph composition
- Streaming: token-level and node-level output streaming without extra plumbing
As of 2025, LangGraph also ships a managed deployment target (LangGraph Platform) with a free tier of 10,000 node executions per month, allowing teams to deploy graphs as production microservices without managing their own persistence infrastructure.
LlamaIndex — Data-Centric RAG and Agent Workflows
LlamaIndex (formerly GPT Index) solves a different problem entirely. Where LangChain and LangGraph are orchestration frameworks, LlamaIndex is a data framework: its primary concern is how you get unstructured and semi-structured data into a form that an LLM can usefully query.
The canonical LlamaIndex pipeline has four stages:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
# 1. Load documents
documents = SimpleDirectoryReader("./docs").load_data()
# 2. Parse into nodes (chunks) with metadata
parser = SentenceSplitter(chunk_size=512, chunk_overlap=64)
nodes = parser.get_nodes_from_documents(documents)
# 3. Build vector index
index = VectorStoreIndex(nodes)
# 4. Query
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("What are the data retention policies?")
print(response)
Where LlamaIndex earns its production reputation is in the depth of its retrieval tooling: hybrid search combining dense vectors with BM25 keyword search, metadata filtering, rerankers, sub-question decomposition, and recursive retrieval over hierarchical document structures. These are not toy capabilities — they address the failure modes that bite every team that moves a RAG prototype into production at scale.
LlamaIndex also ships LlamaParse, a managed parsing service specifically designed for complex enterprise documents — PDFs with multi-column layouts, embedded tables, scanned pages. The 2025 updates added skew detection and improved table extraction fidelity, which matters when your knowledge base consists of financial filings or regulatory documents rather than clean Markdown files.
The agent layer in LlamaIndex is built around Workflows, an event-driven model using @step-decorated functions and a Context API rather than LangGraph's graph metaphor. LlamaIndex Workflows provides first-class support for nested agent structures and shared state, with pause-resume semantics. For teams whose primary concern is document-heavy retrieval and want their agents to treat retrieval as the first-class operation rather than a tool call among many, this model is a better fit.
Side-by-Side Comparison
| Dimension | LangChain | LangGraph | LlamaIndex |
|---|---|---|---|
| Primary focus | Component wiring & LLM integration | Stateful agent orchestration | Data ingestion, indexing & retrieval |
| Mental model | Linear chains (pipes) | Directed cyclic graph (state machine) | Data pipeline + query engine |
| Agent model | Basic ReAct, Plan-and-Execute | Nodes + edges + state + cycles | Event-driven Workflows (@step) |
| RAG support | Basic (relies on integrations) | Minimal (wraps LangChain) | First-class, deep tooling |
| Stateful persistence | Via LangGraph | Built-in checkpointers | Workflow context API |
| Human-in-the-loop | Not native | Built-in interrupt hooks | Supported in Workflows |
| Multi-agent | Manual wiring or community libs | Supervisor/subgraph patterns | First-class routing and collaboration |
| Observability | LangSmith (native) | LangSmith (native) | LlamaCloud / third-party |
| Performance overhead | ~10 ms framework overhead | ~14 ms framework overhead | ~6 ms framework overhead |
| Pricing model | MIT open-source | Free tier + paid plans | Credit-based managed platform |
| GitHub stars (mid-2025) | 140,000+ | 35,700+ | — |
On raw framework overhead, independent benchmarks show LlamaIndex at ~6 ms and LangGraph at ~14 ms for equivalent RAG tasks. For most applications this is noise compared to model latency, but it matters in latency-sensitive workflows where you chain many calls in sequence.
When to Use Which
The question most teams ask is not "which is best" but "which fits my current problem." The following table maps problem patterns to primary recommendations:
| Problem pattern | Recommended tool |
|---|---|
| Swap LLM providers without rewriting logic | LangChain |
| Simple linear RAG pipeline (prototype speed) | LangChain or LlamaIndex |
| Production RAG over large document corpora | LlamaIndex |
| Document parsing (PDFs, tables, scanned docs) | LlamaIndex + LlamaParse |
| Complex agentic loop with tool use | LangGraph |
| Multi-agent system with supervisor logic | LangGraph |
| Human approval before destructive actions | LangGraph |
| Long-running workflows with pause/resume | LangGraph |
| Document-heavy multi-agent system | LlamaIndex Workflows |
| Full observability + tracing | LangGraph + LangSmith |
| You already use Azure/AWS managed infra | LlamaIndex (AWS Bedrock integration) |
The most common production pattern is a combination: LlamaIndex handles document ingestion and retrieval, and LangGraph orchestrates the agent workflow that calls the LlamaIndex query engine as one of its tools. This is not a theoretical architecture — the LlamaIndex query engine exposes a standard tool interface that LangGraph agents can invoke directly via LangChain's tool abstraction.
Reference Repositories
The following publicly available repositories contain working implementations worth reading before committing to an architecture:
- langchain-ai/langchain — Core framework. The
cookbook/directory contains runnable examples covering RAG, agents, and multi-modal pipelines. - langchain-ai/langgraph — Core LangGraph repo. The
examples/directory includes supervisor multi-agent patterns, human-in-the-loop setups, and subgraph compositions. - langchain-samples — Official sample organisation with production-oriented patterns: A2A (agent-to-agent) conversations, deep agents, and travel planner architectures.
- run-llama/llama_index — Core LlamaIndex repo. The
docs/examples/directory covers advanced RAG techniques, metadata extraction, and agentic Workflows. - kyrolabs/awesome-langchain — Curated list of 700+ tools and projects built on LangChain and LangGraph, useful for finding production reference implementations.
- caramaschiHG/awesome-ai-agents-2026 — CC0-licensed list covering 260+ resources across the full agentic AI ecosystem, including LangChain, LangGraph, LlamaIndex, CrewAI, AutoGen, and observability tooling.
Possible Applications by Framework
The following list maps real application types to the framework best suited to drive them. Several naturally involve more than one:
LangChain-led:
- Internal chatbots backed by a knowledge base (simple RAG + LangSmith tracing)
- Code generation assistants with provider-agnostic model switching
- Prompt versioning and A/B testing pipelines
- Data extraction pipelines from structured text
LangGraph-led:
- Autonomous coding agents (generate → test → fix loops)
- Multi-stage compliance review workflows with human approval gates
- Customer support agents with escalation logic
- Research automation (plan → search → synthesise → revise)
- Security incident triage with tool-calling (similar to agentic misuse patterns discussed in the OWASP Top 10 for LLM Applications)
- Deep research systems (akin to OpenAI Deep Research)
- Financial analysis agents with multi-source data synthesis
LlamaIndex-led:
- Enterprise document Q&A over 10,000+ PDFs, contracts, or filings
- Legal research assistants with citation tracking
- Healthcare knowledge bases over clinical guidelines
- Technical support bots backed by product documentation corpora
- Multi-tenant knowledge platforms with document-level access control
- Ingestion pipelines for compliance-auditable RAG (with LlamaParse + metadata)
Hybrid (LlamaIndex ingestion + LangGraph orchestration):
- Autonomous report generation systems that retrieve, synthesise, and format
- Security policy analysis platforms comparing documentation against standards
- AI-assisted due diligence tools for M&A document review
- Regulatory compliance monitoring systems ingesting regulation updates
Security Considerations
None of these frameworks are security-neutral. If you are building agentic systems with LangGraph, the human-in-the-loop hooks are not optional decoration — they are the primary mitigation for the OWASP LLM06 Excessive Agency risk. An agent with tool access to a database, an email service, or an API endpoint can do real damage under a prompt injection attack if there is no approval gate before write actions.
For LlamaIndex-based RAG systems, every document ingested into the vector store is a potential attack surface if external parties can influence what gets indexed — a variant of OWASP LLM08 (Vector and Embedding Weaknesses). Retrieval metadata should carry access control tags that are enforced at query time, not just at ingestion.
Agent identity is a separate concern. If your LangGraph agent calls a downstream service, the credential it uses should be scoped to the minimum permissions needed for that specific task — the same principle governing OAuth token scopes applies directly here. Short-lived tokens issued per-workflow-run are a better default than a long-lived service account credential embedded in configuration.
If your team is integrating any of these frameworks into production and wants a structured review of the resulting AI architecture's security posture — including prompt injection surfaces, agent privilege scope, RAG access control, and tool call boundaries — Reverse Polarity's AI Security Scan maps your implementation against the OWASP Top 10 for LLM Applications and delivers actionable findings your engineering team can prioritise and ship.
Sources
- GitHub — langchain-ai/langchain (140k+ stars)
- GitHub — langchain-ai/langgraph (35.7k+ stars)
- LangGraph product page — langchain.com
- LlamaIndex developer documentation
- LlamaIndex production RAG guide
- Xenoss: LangChain vs LangGraph vs LlamaIndex comparison (August 2025)
- ZenML: LlamaIndex vs LangChain for agentic AI workflows
- AIMultiple: RAG Frameworks performance benchmark
- Sider: LlamaIndex review 2025
- Scalable Path: Building AI Workflows with LangGraph
- IBM Think: What is LangGraph?
- AWS Blog: Build powerful RAG pipelines with LlamaIndex and Amazon Bedrock
- Galileo AI: LlamaIndex complete guide
- kyrolabs/awesome-langchain (GitHub)
- LangChain Expression Language — official blog
- Medium: LangGraph vs LlamaIndex Workflows (no-BS guide 2025)
