langgraph-architecture

Guides architectural decisions for LangGraph applications. Use when deciding between LangGraph vs alternatives, choosing state management strategies, designing multi-agent systems, or selecting persistence and streaming approaches.

3,891 stars

Best use case

langgraph-architecture is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Guides architectural decisions for LangGraph applications. Use when deciding between LangGraph vs alternatives, choosing state management strategies, designing multi-agent systems, or selecting persistence and streaming approaches.

Teams using langgraph-architecture should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/langgraph-architecture/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/anderskev/langgraph-architecture/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/langgraph-architecture/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How langgraph-architecture Compares

Feature / Agentlanggraph-architectureStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Guides architectural decisions for LangGraph applications. Use when deciding between LangGraph vs alternatives, choosing state management strategies, designing multi-agent systems, or selecting persistence and streaming approaches.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# LangGraph Architecture Decisions

## When to Use LangGraph

### Use LangGraph When You Need:

- **Stateful conversations** - Multi-turn interactions with memory
- **Human-in-the-loop** - Approval gates, corrections, interventions
- **Complex control flow** - Loops, branches, conditional routing
- **Multi-agent coordination** - Multiple LLMs working together
- **Persistence** - Resume from checkpoints, time travel debugging
- **Streaming** - Real-time token streaming, progress updates
- **Reliability** - Retries, error recovery, durability guarantees

### Consider Alternatives When:

| Scenario | Alternative | Why |
|----------|-------------|-----|
| Single LLM call | Direct API call | Overhead not justified |
| Linear pipeline | LangChain LCEL | Simpler abstraction |
| Stateless tool use | Function calling | No persistence needed |
| Simple RAG | LangChain retrievers | Built-in patterns |
| Batch processing | Async tasks | Different execution model |

## State Schema Decisions

### TypedDict vs Pydantic

| TypedDict | Pydantic |
|-----------|----------|
| Lightweight, faster | Runtime validation |
| Dict-like access | Attribute access |
| No validation overhead | Type coercion |
| Simpler serialization | Complex nested models |

**Recommendation**: Use TypedDict for most cases. Use Pydantic when you need validation or complex nested structures.

### Reducer Selection

| Use Case | Reducer | Example |
|----------|---------|---------|
| Chat messages | `add_messages` | Handles IDs, RemoveMessage |
| Simple append | `operator.add` | `Annotated[list, operator.add]` |
| Keep latest | None (LastValue) | `field: str` |
| Custom merge | Lambda | `Annotated[list, lambda a, b: ...]` |
| Overwrite list | `Overwrite` | Bypass reducer |

### State Size Considerations

```python
# SMALL STATE (< 1MB) - Put in state
class State(TypedDict):
    messages: Annotated[list, add_messages]
    context: str

# LARGE DATA - Use Store
class State(TypedDict):
    messages: Annotated[list, add_messages]
    document_ref: str  # Reference to store

def node(state, *, store: BaseStore):
    doc = store.get(namespace, state["document_ref"])
    # Process without bloating checkpoints
```

## Graph Structure Decisions

### Single Graph vs Subgraphs

**Single Graph** when:
- All nodes share the same state schema
- Simple linear or branching flow
- < 10 nodes

**Subgraphs** when:
- Different state schemas needed
- Reusable components across graphs
- Team separation of concerns
- Complex hierarchical workflows

### Conditional Edges vs Command

| Conditional Edges | Command |
|------------------|---------|
| Routing based on state | Routing + state update |
| Separate router function | Decision in node |
| Clearer visualization | More flexible |
| Standard patterns | Dynamic destinations |

```python
# Conditional Edge - when routing is the focus
def router(state) -> Literal["a", "b"]:
    return "a" if condition else "b"
builder.add_conditional_edges("node", router)

# Command - when combining routing with updates
def node(state) -> Command:
    return Command(goto="next", update={"step": state["step"] + 1})
```

### Static vs Dynamic Routing

**Static Edges** (`add_edge`):
- Fixed flow known at build time
- Clearer graph visualization
- Easier to reason about

**Dynamic Routing** (`add_conditional_edges`, `Command`, `Send`):
- Runtime decisions based on state
- Agent-driven navigation
- Fan-out patterns

## Persistence Strategy

### Checkpointer Selection

| Checkpointer | Use Case | Characteristics |
|--------------|----------|-----------------|
| `InMemorySaver` | Testing only | Lost on restart |
| `SqliteSaver` | Development | Single file, local |
| `PostgresSaver` | Production | Scalable, concurrent |
| Custom | Special needs | Implement BaseCheckpointSaver |

### Checkpointing Scope

```python
# Full persistence (default)
graph = builder.compile(checkpointer=checkpointer)

# Subgraph options
subgraph = sub_builder.compile(
    checkpointer=None,   # Inherit from parent
    checkpointer=True,   # Independent checkpointing
    checkpointer=False,  # No checkpointing (runs atomically)
)
```

### When to Disable Checkpointing

- Short-lived subgraphs that should be atomic
- Subgraphs with incompatible state schemas
- Performance-critical paths without need for resume

## Multi-Agent Architecture

### Supervisor Pattern

Best for:
- Clear hierarchy
- Centralized decision making
- Different agent specializations

```
          ┌─────────────┐
          │  Supervisor │
          └──────┬──────┘
    ┌────────┬───┴───┬────────┐
    ▼        ▼       ▼        ▼
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│Agent1│ │Agent2│ │Agent3│ │Agent4│
└──────┘ └──────┘ └──────┘ └──────┘
```

### Peer-to-Peer Pattern

Best for:
- Collaborative agents
- No clear hierarchy
- Flexible communication

```
┌──────┐     ┌──────┐
│Agent1│◄───►│Agent2│
└──┬───┘     └───┬──┘
   │             │
   ▼             ▼
┌──────┐     ┌──────┐
│Agent3│◄───►│Agent4│
└──────┘     └──────┘
```

### Handoff Pattern

Best for:
- Sequential specialization
- Clear stage transitions
- Different capabilities per stage

```
┌────────┐    ┌────────┐    ┌────────┐
│Research│───►│Planning│───►│Execute │
└────────┘    └────────┘    └────────┘
```

## Streaming Strategy

### Stream Mode Selection

| Mode | Use Case | Data |
|------|----------|------|
| `updates` | UI updates | Node outputs only |
| `values` | State inspection | Full state each step |
| `messages` | Chat UX | LLM tokens |
| `custom` | Progress/logs | Your data via StreamWriter |
| `debug` | Debugging | Tasks + checkpoints |

### Subgraph Streaming

```python
# Stream from subgraphs
async for chunk in graph.astream(
    input,
    stream_mode="updates",
    subgraphs=True  # Include subgraph events
):
    namespace, data = chunk  # namespace indicates depth
```

## Human-in-the-Loop Design

### Interrupt Placement

| Strategy | Use Case |
|----------|----------|
| `interrupt_before` | Approval before action |
| `interrupt_after` | Review after completion |
| `interrupt()` in node | Dynamic, contextual pauses |

### Resume Patterns

```python
# Simple resume (same thread)
graph.invoke(None, config)

# Resume with value
graph.invoke(Command(resume="approved"), config)

# Resume specific interrupt
graph.invoke(Command(resume={interrupt_id: value}), config)

# Modify state and resume
graph.update_state(config, {"field": "new_value"})
graph.invoke(None, config)
```

## Error Handling Strategy

### Retry Configuration

```python
# Per-node retry
RetryPolicy(
    initial_interval=0.5,
    backoff_factor=2.0,
    max_interval=60.0,
    max_attempts=3,
    retry_on=lambda e: isinstance(e, (APIError, TimeoutError))
)

# Multiple policies (first match wins)
builder.add_node("node", fn, retry_policy=[
    RetryPolicy(retry_on=RateLimitError, max_attempts=5),
    RetryPolicy(retry_on=Exception, max_attempts=2),
])
```

### Fallback Patterns

```python
def node_with_fallback(state):
    try:
        return primary_operation(state)
    except PrimaryError:
        return fallback_operation(state)

# Or use conditional edges for complex fallback routing
def route_on_error(state) -> Literal["retry", "fallback", "__end__"]:
    if state.get("error") and state["attempts"] < 3:
        return "retry"
    elif state.get("error"):
        return "fallback"
    return END
```

## Scaling Considerations

### Horizontal Scaling

- Use PostgresSaver for shared state
- Consider LangGraph Platform for managed infrastructure
- Use stores for large data outside checkpoints

### Performance Optimization

1. **Minimize state size** - Use references for large data
2. **Parallel nodes** - Fan out when possible
3. **Cache expensive operations** - Use CachePolicy
4. **Async everywhere** - Use ainvoke, astream

### Resource Limits

```python
# Set recursion limit
config = {"recursion_limit": 50}
graph.invoke(input, config)

# Track remaining steps in state
class State(TypedDict):
    remaining_steps: RemainingSteps

def check_budget(state):
    if state["remaining_steps"] < 5:
        return "wrap_up"
    return "continue"
```

## Decision Checklist

Before implementing:

1. [ ] Is LangGraph the right tool? (vs simpler alternatives)
2. [ ] State schema defined with appropriate reducers?
3. [ ] Persistence strategy chosen? (dev vs prod checkpointer)
4. [ ] Streaming needs identified?
5. [ ] Human-in-the-loop points defined?
6. [ ] Error handling and retry strategy?
7. [ ] Multi-agent coordination pattern? (if applicable)
8. [ ] Resource limits configured?

Related Skills

Agent Memory Architecture

3891
from openclaw/skills

Complete zero-dependency memory system for AI agents — file-based architecture, daily notes, long-term curation, context management, heartbeat integration, and memory hygiene. No APIs, no databases, no external tools. Works with any agent framework.

langgraph-implementation

3891
from openclaw/skills

Implements stateful agent graphs using LangGraph. Use when building graphs, adding nodes/edges, defining state schemas, implementing checkpointing, handling interrupts, or creating multi-agent systems with LangGraph.

langgraph-code-review

3891
from openclaw/skills

Reviews LangGraph code for bugs, anti-patterns, and improvements. Use when reviewing code that uses StateGraph, nodes, edges, checkpointing, or other LangGraph features. Catches common mistakes in state management, graph structure, and async patterns.

deepagents-architecture

3891
from openclaw/skills

Guides architectural decisions for Deep Agents applications. Use when deciding between Deep Agents vs alternatives, choosing backend strategies, designing subagent systems, or selecting middleware approaches.

agent-architecture-analysis

3891
from openclaw/skills

Perform 12-Factor Agents compliance analysis on any codebase. Use when evaluating agent architecture, reviewing LLM-powered systems, or auditing agentic applications against the 12-Factor methodology.

site-architecture

3891
from openclaw/skills

When the user wants to audit, redesign, or plan their website's structure, URL hierarchy, navigation design, or internal linking strategy. Use when the user mentions 'site architecture,' 'URL structure,' 'internal links,' 'site navigation,' 'breadcrumbs,' 'topic clusters,' 'hub pages,' 'orphan pages,' 'silo structure,' 'information architecture,' or 'website reorganization.' Also use when someone has SEO problems and the root cause is structural (not content or schema). NOT for content strategy decisions about what to write (use content-strategy) or for schema markup (use schema-markup).

agent-architecture-evaluator

3891
from openclaw/skills

Use when evaluating, testing, and optimizing an agent architecture or multi-agent system. Best for reviewing planning, routing, memory, tool use, reliability, observability, cost, and system-level failure modes.

architecture-governance-assessment

3891
from openclaw/skills

Architecture governance and assessment tool. Evaluate cloud architectures against best practices and generate actionable improvement reports.

react-flow-architecture

3880
from openclaw/skills

Architectural guidance for building node-based UIs with React Flow. Use when designing flow-based applications, making decisions about state management, integration patterns, or evaluating whether React Flow fits a use case.

---

3891
from openclaw/skills

name: article-factory-wechat

Content & Documentation

humanizer

3891
from openclaw/skills

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

Content & Documentation

find-skills

3891
from openclaw/skills

Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.

General Utilities