LangChain — LangGraph & Production
LangGraph Overview
LangGraph models agent logic as a directed graph with cycles. Unlike linear LCEL chains, LangGraph supports loops, conditional branching, human-in-the-loop, and persistent state — essential for production agents.
uv add langgraph
Core Concepts
| Concept | Description |
|---|---|
| State | TypedDict holding all data flowing through the graph |
| Nodes | Python functions that read state and return updates |
| Edges | Connections between nodes (static or conditional) |
| Checkpointer | Persists state at each step for recovery and debugging |
Basic StateGraph
from typing import Annotated, TypedDict
from langgraph.graph import END, START, StateGraph
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
messages: Annotated[list, add_messages]
next_step: str
def chatbot(state: AgentState) -> dict:
response = model.invoke(state["messages"])
return {"messages": [response]}
graph = StateGraph(AgentState)
graph.add_node("chatbot", chatbot)
graph.add_edge(START, "chatbot")
graph.add_edge("chatbot", END)
app = graph.compile()
result = app.invoke({"messages": [("human", "Hello!")]})
Annotated[list, add_messages] merges new messages into the list instead of overwriting.
Conditional Edges
from langgraph.graph import END
def should_use_tool(state: AgentState) -> str:
last_message = state["messages"][-1]
if last_message.tool_calls:
return "tools"
return END
graph.add_conditional_edges("agent", should_use_tool, {"tools": "tools", END: END})
Conditional edges route to different nodes based on state — enables decision loops and tool retry patterns.
Agent with Tools
from typing import Annotated, TypedDict
from langchain_openai import ChatOpenAI
from langgraph.graph import END, START, StateGraph
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode
class State(TypedDict):
messages: Annotated[list, add_messages]
tools = [search_database, get_weather]
model = ChatOpenAI(model="gpt-4o-mini").bind_tools(tools)
def agent(state: State) -> dict:
return {"messages": [model.invoke(state["messages"])]}
def should_continue(state: State) -> str:
if state["messages"][-1].tool_calls:
return "tools"
return END
graph = StateGraph(State)
graph.add_node("agent", agent)
graph.add_node("tools", ToolNode(tools))
graph.add_edge(START, "agent")
graph.add_conditional_edges("agent", should_continue)
graph.add_edge("tools", "agent")
app = graph.compile()
The agent calls the LLM, which decides on tool calls. ToolNode executes them. The loop continues until the LLM responds without tool calls.
Checkpointing (State Persistence)
from langgraph.checkpoint.memory import MemorySaver
from langgraph.checkpoint.sqlite import SqliteSaver
memory = MemorySaver()
app = graph.compile(checkpointer=memory)
result = app.invoke(
{"messages": [("human", "What's the weather?")]},
config={"configurable": {"thread_id": "session_001"}},
)
| Checkpointer | Use case |
|---|---|
MemorySaver |
Development, testing |
SqliteSaver |
Local persistence |
PostgresSaver |
Production multi-instance |
RedisSaver |
Fast, shared state |
Each thread_id maintains independent conversation state.
Human-in-the-Loop
app = graph.compile(
checkpointer=memory,
interrupt_before=["tools"],
)
result = app.invoke(
{"messages": [("human", "Delete all users")]},
config={"configurable": {"thread_id": "admin_001"}},
)
# Review pending tool calls, then resume or modify
app.invoke(None, config={"configurable": {"thread_id": "admin_001"}})
interrupt_before pauses execution before a node runs — useful for approval gates on dangerous operations.
LangSmith Observability
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=your-key
export LANGCHAIN_PROJECT=my-project
from langsmith import Client
client = Client()
| Feature | Purpose |
|---|---|
| Tracing | Visualize full chain/agent execution flow |
| Evaluation | Run test datasets against chains |
| Monitoring | Track latency, cost, error rates |
| Datasets | Manage test cases for regression testing |
| Annotation | Human feedback on outputs |
Error Handling Patterns
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o").with_retry(
stop_after_attempt=3,
wait_exponential_jitter=True,
)
resilient = ChatOpenAI(model="gpt-4o").with_fallbacks([
ChatOpenAI(model="gpt-4o-mini"),
])
Agent Iteration Limits
app = graph.compile(checkpointer=memory)
result = app.invoke(
{"messages": messages},
config={
"configurable": {"thread_id": "t1"},
"recursion_limit": 25,
},
)
Production Checklist
| Area | Practice |
|---|---|
| Retry | .with_retry() on all LLM and tool calls |
| Fallbacks | .with_fallbacks() with alternate models |
| Timeouts | Set timeout on model init |
| Limits | recursion_limit and max_iterations |
| Persistence | PostgresSaver / RedisSaver for checkpointing |
| Tracing | LangSmith enabled in all environments |
| Streaming | Use .astream() for responsive UIs |
| Testing | LangSmith datasets for regression tests |
| Secrets | API keys via env vars, never hardcoded |
| Cost | Track token usage via LangSmith monitoring |
When to Use What
| Scenario | Solution |
|---|---|
| Simple prompt → response | LCEL chain |
| RAG Q&A | LCEL retrieval chain |
| Multi-step tool use | LangGraph agent |
| Approval workflows | LangGraph + interrupt_before |
| Long-running tasks | LangGraph + checkpointing |
| Parallel tool execution | LangGraph with parallel nodes |
| Monitoring & debugging | LangSmith tracing |
Checkpointer Packages by Backend
uv add langgraph-checkpoint-sqlite
uv add langgraph-checkpoint-postgres
uv add langgraph-checkpoint-redis
Choose the backend based on operational needs and failover requirements.
Resume and Time-Travel
Use thread_id for normal resume, and checkpoint_id for replay/debug from a specific point.
debug_config = {
"configurable": {
"thread_id": "session_001",
"checkpoint_id": "checkpoint-abc123",
}
}
Recursion Limit Notes
Set recursion_limit intentionally for deep graphs and monitor for accidental loops:
result = app.invoke(input_data, config={"recursion_limit": 100})
If limits are reached frequently, inspect conditional edges for non-terminating branches.
Multi-Agent Supervisor Pattern
Use a supervisor when one agent should delegate tasks to specialized subagents.
Typical flow
- Supervisor receives user goal.
- Supervisor routes tasks to specialist subagents.
- Subagents return structured outputs.
- Supervisor merges results and produces final answer.
Practical rules
- Keep supervisor deterministic (
temperature=0). - Force subagent outputs into strict schemas.
- Limit delegation depth to avoid recursive loops.
- Tag traces per role (
agent=supervisor,agent=researcher, etc.).
This pattern fits complex tasks where one prompt cannot reliably decide all steps.
Plain-Language Summary
LangGraph is a workflow engine for agent logic:
- state stores current data;
- nodes perform work;
- edges decide where to go next;
- checkpointer lets you pause/resume safely.
Use it when business flow is not strictly linear.
Minimum Production Baseline
Before real users, ensure all of the following:
- Persistent checkpointer (
PostgresSaverorRedisSaver). - Stable
thread_idstrategy per conversation/session. recursion_limitand iteration limits configured.- Human approval gates for risky tool actions.
- Full tracing in LangSmith with tags and metadata.
Common Failure Modes
| Problem | Typical cause | Fix |
|---|---|---|
| Graph loops forever | Missing/incorrect stop edge | Add explicit terminal condition |
| Resume does not work | New thread_id each call |
Reuse same thread_id for session |
| State is lost after restart | In-memory checkpointer in prod | Use persistent backend |
| Expensive runs | Unlimited loops/tool calls | Set limits and tighter routing logic |