model-fallback

Multi-model automatic fallback system. Monitors model availability and automatically falls back to backup models when the primary model fails. Supports MiniMax, Kimi, Zhipu and other OpenAI-compatible APIs. Use when: (1) Primary model API is unavailable, (2) Model response time is too slow, (3) Rate limit exceeded, (4) Need to optimize costs by using cheaper models for simple tasks.

3,891 stars

Best use case

model-fallback is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Multi-model automatic fallback system. Monitors model availability and automatically falls back to backup models when the primary model fails. Supports MiniMax, Kimi, Zhipu and other OpenAI-compatible APIs. Use when: (1) Primary model API is unavailable, (2) Model response time is too slow, (3) Rate limit exceeded, (4) Need to optimize costs by using cheaper models for simple tasks.

Teams using model-fallback should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/model-fallback/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/azure5100/model-fallback/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/model-fallback/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How model-fallback Compares

Feature / Agentmodel-fallbackStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Multi-model automatic fallback system. Monitors model availability and automatically falls back to backup models when the primary model fails. Supports MiniMax, Kimi, Zhipu and other OpenAI-compatible APIs. Use when: (1) Primary model API is unavailable, (2) Model response time is too slow, (3) Rate limit exceeded, (4) Need to optimize costs by using cheaper models for simple tasks.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Model Fallback Skill

> Multi-model automatic fallback system for AI agents

## Overview

This skill provides automatic model fallback functionality for OpenClaw agents. When the primary model fails (unavailable, slow, or rate-limited), it automatically switches to backup models in a predefined priority order.

## Features

- **Automatic Fallback**: Seamlessly switch to backup models on failure
- **Configurable Priority**: Define your own model fallback order
- **Health Monitoring**: Track model availability and response times
- **Cost Optimization**: Use cheaper models for simple tasks
- **Logging**: Full audit trail of fallback events

## Supported Models

| Provider | Model | Context | Use Case |
|----------|-------|---------|----------|
| MiniMax | M2.5 | 200K | Primary (reasoning) |
| MiniMax | M2.1 | 200K | Backup |
| Kimi | K2.5 | 256K | Long documents |
| Kimi | K2 | 128K | Standard |
| Zhipu | GLM-4-Air | 128K | Low cost |
| Zhipu | GLM-4-Flash | 1M | High volume |

## Configuration

### Default Fallback Chain

```json
{
  "fallback_chain": [
    {
      "provider": "minimax-portal",
      "model": "MiniMax-M2.5",
      "priority": 1,
      "timeout": 30,
      "max_retries": 3
    },
    {
      "provider": "moonshot",
      "model": "kimi-k2.5",
      "priority": 2,
      "timeout": 30,
      "max_retries": 2
    },
    {
      "provider": "zhipu",
      "model": "glm-4-air",
      "priority": 3,
      "timeout": 20,
      "max_retries": 2
    }
  ]
}
```

### Environment Variables

| Variable | Required | Description |
|----------|----------|-------------|
| `MODEL_FALLBACK_ENABLED` | No | Enable/disable fallback (default: true) |
| `MODEL_FALLBACK_LOG_LEVEL` | No | Log level: debug, info, warn, error |

## Usage

### Basic Usage

The skill automatically handles model failures. No explicit calls needed.

```bash
# Trigger a model call (fallback happens automatically on failure)
```

### Manual Fallback

```bash
# Force fallback to next model
/scripts/model-fallback.sh --force-next

# Check current model status
/scripts/model-fallback.sh --status

# Reset to primary model
/scripts/model-fallback.sh --reset
```

### Configuration

Edit `config.json` to customize the fallback chain:

```json
{
  "fallback_chain": [
    {"provider": "...", "model": "...", "priority": 1}
  ],
  "health_check": {
    "enabled": true,
    "interval_seconds": 300
  }
}
```

## How It Works

```
1. User makes request with primary model
2. Model call fails (error, timeout, rate limit)
3. Skill detects failure
4. Wait 3 seconds (debounce)
5. Switch to next model in chain
6. Retry request with new model
7. If successful, return result
8. If failed, repeat steps 4-7
9. If all models fail, return error with details
```

## Fallback Triggers

| Trigger | Condition | Action |
|----------|-----------|--------|
| API Unavailable | Connection timeout | Fallback |
| Rate Limit | 429 response | Fallback + wait |
| Slow Response | > timeout seconds | Fallback |
| Invalid Response | Parse error | Fallback |
| Auth Error | 401/403 response | Log + stop |

## Logging

Logs are written to:
- `~/.openclaw/logs/model-fallback.log`

### Log Format

```
[2026-02-27 14:00:00] [INFO] Primary model MiniMax-M2.5 called
[2026-02-27 14:00:05] [WARN] Model failed: rate limit exceeded
[2026-02-27 14:00:05] [INFO] Falling back to Kimi K2.5
[2026-02-27 14:00:10] [INFO] Fallback successful
```

## Cost Optimization

Use cheaper models for simple tasks:

```json
{
  "task_routing": {
    "simple_query": ["glm-4-air", "glm-4-flash"],
    "complex_reasoning": ["MiniMax-M2.5", "kimi-k2.5"],
    "long_context": ["kimi-k2.5", "MiniMax-M2.1"]
  }
}
```

## Integration

### OpenClaw Configuration

Add to `openclaw.json`:

```json
{
  "models": {
    "mode": "merge",
    "fallback": {
      "enabled": true,
      "config": "~/.openclaw/skills/model-fallback/config.json"
    }
  }
}
```

### Health Check

Integrate with system health monitoring:

```bash
# Check model health
curl http://localhost:18789/api/models/health
```

## Troubleshooting

### Fallback Not Working

1. Check if fallback is enabled: `echo $MODEL_FALLBACK_ENABLED`
2. Verify config exists: `ls ~/.openclaw/skills/model-fallback/config.json`
3. Check logs: `tail -f ~/.openclaw/logs/model-fallback.log`

### Models Always Failing

1. Check API keys are valid
2. Verify network connectivity
3. Check rate limits on provider dashboard

## Examples

### Example 1: Simple Fallback

```
User: "Hello"
System: Using MiniMax-M2.5...
System: Rate limited, switching to Kimi K2.5...
System: Response from Kimi K2.5: "Hello! How can I help?"
```

### Example 2: Cost Optimization

```
User: "What is 2+2?"
System: Routing to glm-4-air (low cost)...
System: Response: "2+2=4"
```

### Example 3: Long Document

```
User: "Summarize this 100-page PDF"
System: Detected long context requirement
System: Routing to Kimi K2.5 (256K context)...
System: Processing...
```

## License

MIT

## Author

CC (AI Assistant)

## Version

1.0.0

Related Skills

MCP Engineering — Complete Model Context Protocol System

3891
from openclaw/skills

Build, integrate, secure, and scale MCP servers and clients. From first server to production multi-tool architecture.

AI Infrastructure & Integrations

ml-model-eval-benchmark

3891
from openclaw/skills

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

Machine Learning

decision-mental-models

3891
from openclaw/skills

Apply the most relevant mental models (First Principles, Inversion, Second-Order Thinking, Occam's Razor, and 16 others) to any problem or decision, surfaces non-obvious insights by explicitly matching and working through 2-3 models per query.

model-usage

3891
from openclaw/skills

Summarize per-model usage for Codex or Claude including cost tracking. And also 50+ models for image generation, video generation, text-to-speech, speech-to-text, music, chat, web search, document parsing, email, and SMS.

glm-v-model

3891
from openclaw/skills

智谱 GLM-4V/4.6V 视觉模型调用技能。用于图像/视频理解、多模态对话、图表分析等任务。 当用户提到:图片理解、图像识别、视觉模型、GLM-4V、GLM-4.6V、多模态分析、看图说话、图表分析、视频理解时使用此技能。

modelscope-image-gen

3891
from openclaw/skills

通过魔搭社区(ModelScope) API 生成图片。先使用 --list-models 查看可用模型,然后根据用户需求由 AI 生成专业的提示词,最后调用 API 生成图片。支持 Kolors、Stable Diffusion XL、FLUX 等多种文生图模型。当用户需要使用魔搭社区、ModelScope 或中文 AI 模型生成图片时使用此技能。

pydantic-ai-model-integration

3891
from openclaw/skills

Configure LLM providers, use fallback models, handle streaming, and manage model settings in PydanticAI. Use when selecting models, implementing resilience, or optimizing API calls.

model-council

3891
from openclaw/skills

Multi-model consensus system — send a query to 3+ different LLMs via OpenRouter simultaneously, then a judge model evaluates all responses and produces a winner, reasoning, and synthesized best answer. Like having a board of AI advisors. Use for important decisions, code review, research verification.

model-audit

3891
from openclaw/skills

Monthly LLM stack audit — compare your current models against latest benchmarks and pricing from OpenRouter. Identifies potential savings, upgrades, and better alternatives by category (reasoning, code, fast, cheap, vision). Use for optimizing AI costs and staying on the frontier.

Model Intel

3891
from openclaw/skills

Live LLM model intelligence from OpenRouter. Compare pricing, search models by name, find the best model for any task — code, reasoning, creative, fast, cheap, vision, long-context. Real-time data from 200+ models. Use when choosing models, comparing costs, or auditing your AI stack.

model-brain

3891
from openclaw/skills

Route each incoming message to the right Bankr/OpenClaw model or to a zero-LLM path based on task type, risk, and cost. Use when you need per-message model selection, cost-aware routing, deterministic skill bypasses, or a model recommendation for aaigotchi workflows.

wavelet-world-model

3891
from openclaw/skills

Generates a world model representation from state inputs using discrete wavelet transforms (DWT) to capture multi-resolution temporal and spatial features.