llm-router

Selects the optimal LLM model and provider for each task based on complexity, cost budget, and capability requirements. Routes cheap tasks to Haiku/GPT-4o-mini and complex tasks to Sonnet/Opus/o1. Use when deciding which model to call, optimizing LLM costs, or building multi-model agent systems. Activate on "which model", "model selection", "route to model", "LLM cost", "model routing", "cheap vs expensive model". NOT for prompt engineering (use prompt-engineer), model fine-tuning, or training custom models.

85 stars

Best use case

llm-router is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Selects the optimal LLM model and provider for each task based on complexity, cost budget, and capability requirements. Routes cheap tasks to Haiku/GPT-4o-mini and complex tasks to Sonnet/Opus/o1. Use when deciding which model to call, optimizing LLM costs, or building multi-model agent systems. Activate on "which model", "model selection", "route to model", "LLM cost", "model routing", "cheap vs expensive model". NOT for prompt engineering (use prompt-engineer), model fine-tuning, or training custom models.

Teams using llm-router should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/llm-router/SKILL.md --create-dirs "https://raw.githubusercontent.com/curiositech/some_claude_skills/main/.claude/skills/llm-router/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/llm-router/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How llm-router Compares

Feature / Agentllm-routerStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Selects the optimal LLM model and provider for each task based on complexity, cost budget, and capability requirements. Routes cheap tasks to Haiku/GPT-4o-mini and complex tasks to Sonnet/Opus/o1. Use when deciding which model to call, optimizing LLM costs, or building multi-model agent systems. Activate on "which model", "model selection", "route to model", "LLM cost", "model routing", "cheap vs expensive model". NOT for prompt engineering (use prompt-engineer), model fine-tuning, or training custom models.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# LLM Router

Selects the optimal LLM model for each task. The single biggest cost lever in multi-agent systems — intelligent routing saves 45-85% while maintaining 95%+ of top-model quality.

---

## When to Use

✅ **Use for**:
- Deciding which model to call for a specific task
- Assigning models to DAG nodes in agent workflows
- Optimizing LLM API costs across a system
- Building cascading try-cheap-first patterns

❌ **NOT for**:
- Prompt engineering (use `prompt-engineer`)
- Model fine-tuning or training
- Comparing model architectures (academic research)

---

## Routing Decision Tree

```mermaid
flowchart TD
  A{Task type?} -->|Classify / validate / format / extract| T1["Tier 1: Haiku, GPT-4o-mini (~$0.001)"]
  A -->|Write / implement / review / synthesize| T2["Tier 2: Sonnet, GPT-4o (~$0.01)"]
  A -->|Reason / architect / judge / decompose| T3["Tier 3: Opus, o1 (~$0.10)"]
  
  T1 --> Q1{Quality sufficient?}
  Q1 -->|Yes| Done1[Use cheap model]
  Q1 -->|No| T2
  
  T2 --> Q2{Quality sufficient?}
  Q2 -->|Yes| Done2[Use balanced model]
  Q2 -->|No| T3
```

---

## Tier Assignment Table

| Task Type | Tier | Models | Cost/Call | Why This Tier |
|-----------|------|--------|-----------|---------------|
| Classify input type | 1 | Haiku, GPT-4o-mini | ~$0.001 | Deterministic categorization |
| Validate schema/format | 1 | Haiku, GPT-4o-mini | ~$0.001 | Mechanical checking |
| Format output / template | 1 | Haiku, GPT-4o-mini | ~$0.001 | Structured transformation |
| Extract structured data | 1 | Haiku, GPT-4o-mini | ~$0.001 | Pattern matching |
| Summarize text | 1-2 | Haiku → Sonnet | ~$0.001-0.01 | Short summaries: Haiku; nuanced: Sonnet |
| Write content/docs | 2 | Sonnet, GPT-4o | ~$0.01 | Creative quality matters |
| Implement code | 2 | Sonnet, GPT-4o | ~$0.01 | Correctness + style |
| Review code/diffs | 2 | Sonnet, GPT-4o | ~$0.01 | Needs judgment, not just pattern matching |
| Research synthesis | 2 | Sonnet, GPT-4o | ~$0.01 | Multi-source reasoning |
| Decompose ambiguous problem | 3 | Opus, o1 | ~$0.10 | Requires deep understanding |
| Design architecture | 3 | Opus, o1 | ~$0.10 | Complex system reasoning |
| Judge output quality | 3 | Opus, o1 | ~$0.10 | Meta-reasoning about quality |
| Plan multi-step strategy | 3 | Opus, o1 | ~$0.10 | Long-horizon planning |

---

## Three Routing Strategies

### Strategy 1: Static Tier Assignment (Start Here)

Assign model by task type at DAG design time. No runtime logic. Gets 60-70% of possible savings.

```yaml
nodes:
  - id: classify
    model: claude-haiku-4-5     # Tier 1: $0.001
  - id: implement
    model: claude-sonnet-4-5    # Tier 2: $0.01  
  - id: evaluate
    model: claude-opus-4-5      # Tier 3: $0.10
```

### Strategy 2: Cascading (Try Cheap First)

Try the cheap model; if quality is below threshold, escalate. Adds ~1s latency but saves 50-80% on nodes where cheap succeeds.

```
1. Execute with Tier 1 model
2. Quick quality check (also Tier 1 — costs ~$0.001)
3. If quality ≥ threshold → done
4. If quality < threshold → re-execute with Tier 2
```

Best for nodes where you're genuinely unsure which tier is needed.

### Strategy 3: Adaptive (Learn from History)

Record success/failure per task type per model. Over time, the router learns:
- "Classification nodes always succeed on Haiku" → stay cheap
- "Code review nodes fail on Haiku 40% of the time" → upgrade to Sonnet
- "Architecture nodes succeed on Sonnet 90% of the time" → don't need Opus

Gets 75-85% savings after ~100 executions of training data.

---

## Provider Selection

Once model tier is chosen, select the provider:

| Model Class | Provider Options | Selection Criteria |
|------------|-----------------|-------------------|
| Haiku-class | Anthropic, AWS Bedrock | Latency, regional availability |
| Sonnet-class | Anthropic, AWS Bedrock, GCP Vertex | Cost, rate limits |
| Opus-class | Anthropic | Only provider |
| GPT-4o-class | OpenAI, Azure OpenAI | Rate limits, compliance |
| Open-source | Ollama (local), Together.ai, Fireworks | Cost ($0), latency, GPU availability |

---

## Cost Impact Example

10-node DAG, "refactor a codebase":

| Strategy | Mix | Cost | Savings |
|----------|-----|------|---------|
| All Opus | 10× $0.10 | $1.00 | — |
| All Sonnet | 10× $0.01 | $0.10 | 90% |
| Static tiers | 4× Haiku + 4× Sonnet + 2× Opus | $0.24 | 76% |
| Cascading | 6× Haiku + 3× Sonnet + 1× Opus | $0.14 | 86% |
| Adaptive (trained) | Dynamic | ~$0.08 | 92% |

---

## Anti-Patterns

### Always Use the Best Model
**Wrong**: Route everything to Opus/o1 "for quality."
**Reality**: 60%+ of typical DAG nodes are classification, validation, or formatting — tasks where Haiku performs identically to Opus. You're burning money.

### Always Use the Cheapest Model
**Wrong**: Route everything to Haiku "for cost."
**Reality**: Complex reasoning, architecture design, and quality judgment genuinely need stronger models. Haiku will produce plausible-looking but subtly wrong output on hard tasks.

### Ignoring Latency
**Wrong**: Only optimizing for cost, ignoring that Opus takes 5-10x longer than Haiku.
**Reality**: In a 10-node DAG, model choice affects total execution time as much as cost. Route time-critical paths to faster models.

### No Feedback Loop
**Wrong**: Setting model tiers once and never adjusting.
**Reality**: As models improve (Haiku gets smarter every generation), tasks that needed Sonnet last month may work on Haiku today. Record outcomes and adapt.

Related Skills

nextjs-app-router-expert

85
from curiositech/some_claude_skills

Expert in Next.js 14/15 App Router architecture, React Server Components (RSC), Server Actions, and modern full-stack React development. Specializes in routing patterns, data fetching strategies, caching, streaming, and deployment optimization.

skill-coach

85
from curiositech/some_claude_skills

Guides creation of high-quality Agent Skills with domain expertise, anti-pattern detection, and progressive disclosure best practices. Use when creating skills, reviewing existing skills, or when users mention improving skill quality, encoding expertise, or avoiding common AI tooling mistakes. Activate on keywords: create skill, review skill, skill quality, skill best practices, skill anti-patterns. NOT for general coding advice or non-skill Claude Code features.

3d-cv-labeling-2026

85
from curiositech/some_claude_skills

Expert in 3D computer vision labeling tools, workflows, and AI-assisted annotation for LiDAR, point clouds, and sensor fusion. Covers SAM4D/Point-SAM, human-in-the-loop architectures, and vertical-specific training strategies. Activate on '3D labeling', 'point cloud annotation', 'LiDAR labeling', 'SAM 3D', 'SAM4D', 'sensor fusion annotation', '3D bounding box', 'semantic segmentation point cloud'. NOT for 2D image labeling (use clip-aware-embeddings), general ML training (use ml-engineer), video annotation without 3D (use computer-vision-pipeline), or VLM prompt engineering (use prompt-engineer).

wisdom-accountability-coach

85
from curiositech/some_claude_skills

Longitudinal memory tracking, philosophy teaching, and personal accountability with compassion. Expert in pattern recognition, Stoicism/Buddhism, and growth guidance. Activate on 'accountability', 'philosophy', 'Stoicism', 'Buddhism', 'personal growth', 'commitment tracking', 'wisdom teaching'. NOT for therapy or mental health treatment (refer to professionals), crisis intervention, or replacing professional coaching credentials.

windows-95-web-designer

85
from curiositech/some_claude_skills

Modern web applications with authentic Windows 95 aesthetic. Gradient title bars, Start menu paradigm, taskbar patterns, 3D beveled chrome. Extrapolates Win95 to AI chatbots, mobile UIs, responsive layouts. Activate on 'windows 95', 'win95', 'start menu', 'taskbar', 'retro desktop', '95 aesthetic', 'clippy'. NOT for Windows 3.1 (use windows-3-1-web-designer), vaporwave/synthwave, macOS, flat design.

windows-3-1-web-designer

85
from curiositech/some_claude_skills

Modern web applications with authentic Windows 3.1 aesthetic. Solid navy title bars, Program Manager navigation, beveled borders, single window controls. Extrapolates Win31 to AI chatbots (Cue Card paradigm), mobile UIs (pocket computing). Activate on 'windows 3.1', 'win31', 'program manager', 'retro desktop', '90s aesthetic', 'beveled'. NOT for Windows 95 (use windows-95-web-designer - has gradients, Start menu), vaporwave/synthwave, macOS, flat design.

win31-pixel-art-designer

85
from curiositech/some_claude_skills

Expert in Windows 3.1 era pixel art and graphics. Creates icons, banners, splash screens, and UI assets with authentic 16/256-color palettes, dithering patterns, and Program Manager styling. Activate on 'win31 icons', 'pixel art 90s', 'retro icons', '16-color', 'dithering', 'program manager icons', 'VGA palette'. NOT for modern flat icons, vaporwave art, or high-res illustrations.

win31-audio-design

85
from curiositech/some_claude_skills

Expert in Windows 3.1 era sound vocabulary for modern web/mobile apps. Creates satisfying retro UI sounds using CC-licensed 8-bit audio, Web Audio API, and haptic coordination. Activate on 'win31 sounds', 'retro audio', '90s sound effects', 'chimes', 'tada', 'ding', 'satisfying UI sounds'. NOT for modern flat UI sounds, voice synthesis, or music composition.

wedding-immortalist

85
from curiositech/some_claude_skills

Transform thousands of wedding photos and hours of footage into an immersive 3D Gaussian Splatting experience with theatre mode replay, face-clustered guest roster, and AI-curated best photos per person. Expert in 3DGS pipelines, face clustering, aesthetic scoring, and adaptive design matching the couple's wedding theme (disco, rustic, modern, LGBTQ+ celebrations). Activate on "wedding photos", "wedding video", "3D wedding", "Gaussian Splatting wedding", "wedding memory", "wedding immortalize", "face clustering wedding", "best wedding photos". NOT for general photo editing (use native-app-designer), non-wedding 3DGS (use drone-inspection-specialist), or event planning (not a wedding planner).

websocket-streaming

85
from curiositech/some_claude_skills

Implements real-time bidirectional communication between DAG execution engines and visualization dashboards via WebSocket. Covers connection management, typed event protocols, reconnection with backoff, and React hook integration. Activate on "WebSocket", "real-time updates", "live streaming", "execution events", "state streaming", "push notifications". NOT for HTTP REST APIs, server-sent events (SSE), or general networking.

webapp-testing

85
from curiositech/some_claude_skills

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs. Activate on: Playwright, webapp testing, browser automation, E2E testing, UI testing. NOT for API-only testing without browser, unit tests, or mobile app testing.

web-weather-creator

85
from curiositech/some_claude_skills

Master of stylized atmospheric effects using SVG filters and CSS animations. Creates clouds, waves, lightning, rain, fog, aurora borealis, god rays, lens flares, twilight skies, and ocean spray—all with a premium aesthetic that's stylized but never cheap-looking.