model-fallback
Multi-model automatic fallback system. Monitors model availability and automatically falls back to backup models when the primary model fails. Supports MiniMax, Kimi, Zhipu and other OpenAI-compatible APIs. Use when: (1) Primary model API is unavailable, (2) Model response time is too slow, (3) Rate limit exceeded, (4) Need to optimize costs by using cheaper models for simple tasks.
Best use case
model-fallback is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Multi-model automatic fallback system. Monitors model availability and automatically falls back to backup models when the primary model fails. Supports MiniMax, Kimi, Zhipu and other OpenAI-compatible APIs. Use when: (1) Primary model API is unavailable, (2) Model response time is too slow, (3) Rate limit exceeded, (4) Need to optimize costs by using cheaper models for simple tasks.
Teams using model-fallback should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/model-fallback/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How model-fallback Compares
| Feature / Agent | model-fallback | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Multi-model automatic fallback system. Monitors model availability and automatically falls back to backup models when the primary model fails. Supports MiniMax, Kimi, Zhipu and other OpenAI-compatible APIs. Use when: (1) Primary model API is unavailable, (2) Model response time is too slow, (3) Rate limit exceeded, (4) Need to optimize costs by using cheaper models for simple tasks.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Top AI Agents for Productivity
See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
AI Agents for Startups
Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.
SKILL.md Source
# Model Fallback Skill
> Multi-model automatic fallback system for AI agents
## Overview
This skill provides automatic model fallback functionality for OpenClaw agents. When the primary model fails (unavailable, slow, or rate-limited), it automatically switches to backup models in a predefined priority order.
## Features
- **Automatic Fallback**: Seamlessly switch to backup models on failure
- **Configurable Priority**: Define your own model fallback order
- **Health Monitoring**: Track model availability and response times
- **Cost Optimization**: Use cheaper models for simple tasks
- **Logging**: Full audit trail of fallback events
## Supported Models
| Provider | Model | Context | Use Case |
|----------|-------|---------|----------|
| MiniMax | M2.5 | 200K | Primary (reasoning) |
| MiniMax | M2.1 | 200K | Backup |
| Kimi | K2.5 | 256K | Long documents |
| Kimi | K2 | 128K | Standard |
| Zhipu | GLM-4-Air | 128K | Low cost |
| Zhipu | GLM-4-Flash | 1M | High volume |
## Configuration
### Default Fallback Chain
```json
{
"fallback_chain": [
{
"provider": "minimax-portal",
"model": "MiniMax-M2.5",
"priority": 1,
"timeout": 30,
"max_retries": 3
},
{
"provider": "moonshot",
"model": "kimi-k2.5",
"priority": 2,
"timeout": 30,
"max_retries": 2
},
{
"provider": "zhipu",
"model": "glm-4-air",
"priority": 3,
"timeout": 20,
"max_retries": 2
}
]
}
```
### Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `MODEL_FALLBACK_ENABLED` | No | Enable/disable fallback (default: true) |
| `MODEL_FALLBACK_LOG_LEVEL` | No | Log level: debug, info, warn, error |
## Usage
### Basic Usage
The skill automatically handles model failures. No explicit calls needed.
```bash
# Trigger a model call (fallback happens automatically on failure)
```
### Manual Fallback
```bash
# Force fallback to next model
/scripts/model-fallback.sh --force-next
# Check current model status
/scripts/model-fallback.sh --status
# Reset to primary model
/scripts/model-fallback.sh --reset
```
### Configuration
Edit `config.json` to customize the fallback chain:
```json
{
"fallback_chain": [
{"provider": "...", "model": "...", "priority": 1}
],
"health_check": {
"enabled": true,
"interval_seconds": 300
}
}
```
## How It Works
```
1. User makes request with primary model
2. Model call fails (error, timeout, rate limit)
3. Skill detects failure
4. Wait 3 seconds (debounce)
5. Switch to next model in chain
6. Retry request with new model
7. If successful, return result
8. If failed, repeat steps 4-7
9. If all models fail, return error with details
```
## Fallback Triggers
| Trigger | Condition | Action |
|----------|-----------|--------|
| API Unavailable | Connection timeout | Fallback |
| Rate Limit | 429 response | Fallback + wait |
| Slow Response | > timeout seconds | Fallback |
| Invalid Response | Parse error | Fallback |
| Auth Error | 401/403 response | Log + stop |
## Logging
Logs are written to:
- `~/.openclaw/logs/model-fallback.log`
### Log Format
```
[2026-02-27 14:00:00] [INFO] Primary model MiniMax-M2.5 called
[2026-02-27 14:00:05] [WARN] Model failed: rate limit exceeded
[2026-02-27 14:00:05] [INFO] Falling back to Kimi K2.5
[2026-02-27 14:00:10] [INFO] Fallback successful
```
## Cost Optimization
Use cheaper models for simple tasks:
```json
{
"task_routing": {
"simple_query": ["glm-4-air", "glm-4-flash"],
"complex_reasoning": ["MiniMax-M2.5", "kimi-k2.5"],
"long_context": ["kimi-k2.5", "MiniMax-M2.1"]
}
}
```
## Integration
### OpenClaw Configuration
Add to `openclaw.json`:
```json
{
"models": {
"mode": "merge",
"fallback": {
"enabled": true,
"config": "~/.openclaw/skills/model-fallback/config.json"
}
}
}
```
### Health Check
Integrate with system health monitoring:
```bash
# Check model health
curl http://localhost:18789/api/models/health
```
## Troubleshooting
### Fallback Not Working
1. Check if fallback is enabled: `echo $MODEL_FALLBACK_ENABLED`
2. Verify config exists: `ls ~/.openclaw/skills/model-fallback/config.json`
3. Check logs: `tail -f ~/.openclaw/logs/model-fallback.log`
### Models Always Failing
1. Check API keys are valid
2. Verify network connectivity
3. Check rate limits on provider dashboard
## Examples
### Example 1: Simple Fallback
```
User: "Hello"
System: Using MiniMax-M2.5...
System: Rate limited, switching to Kimi K2.5...
System: Response from Kimi K2.5: "Hello! How can I help?"
```
### Example 2: Cost Optimization
```
User: "What is 2+2?"
System: Routing to glm-4-air (low cost)...
System: Response: "2+2=4"
```
### Example 3: Long Document
```
User: "Summarize this 100-page PDF"
System: Detected long context requirement
System: Routing to Kimi K2.5 (256K context)...
System: Processing...
```
## License
MIT
## Author
CC (AI Assistant)
## Version
1.0.0Related Skills
MCP Engineering — Complete Model Context Protocol System
Build, integrate, secure, and scale MCP servers and clients. From first server to production multi-tool architecture.
ml-model-eval-benchmark
Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.
decision-mental-models
Apply the most relevant mental models (First Principles, Inversion, Second-Order Thinking, Occam's Razor, and 16 others) to any problem or decision, surfaces non-obvious insights by explicitly matching and working through 2-3 models per query.
model-usage
Summarize per-model usage for Codex or Claude including cost tracking. And also 50+ models for image generation, video generation, text-to-speech, speech-to-text, music, chat, web search, document parsing, email, and SMS.
glm-v-model
智谱 GLM-4V/4.6V 视觉模型调用技能。用于图像/视频理解、多模态对话、图表分析等任务。 当用户提到:图片理解、图像识别、视觉模型、GLM-4V、GLM-4.6V、多模态分析、看图说话、图表分析、视频理解时使用此技能。
modelscope-image-gen
通过魔搭社区(ModelScope) API 生成图片。先使用 --list-models 查看可用模型,然后根据用户需求由 AI 生成专业的提示词,最后调用 API 生成图片。支持 Kolors、Stable Diffusion XL、FLUX 等多种文生图模型。当用户需要使用魔搭社区、ModelScope 或中文 AI 模型生成图片时使用此技能。
pydantic-ai-model-integration
Configure LLM providers, use fallback models, handle streaming, and manage model settings in PydanticAI. Use when selecting models, implementing resilience, or optimizing API calls.
model-council
Multi-model consensus system — send a query to 3+ different LLMs via OpenRouter simultaneously, then a judge model evaluates all responses and produces a winner, reasoning, and synthesized best answer. Like having a board of AI advisors. Use for important decisions, code review, research verification.
model-audit
Monthly LLM stack audit — compare your current models against latest benchmarks and pricing from OpenRouter. Identifies potential savings, upgrades, and better alternatives by category (reasoning, code, fast, cheap, vision). Use for optimizing AI costs and staying on the frontier.
Model Intel
Live LLM model intelligence from OpenRouter. Compare pricing, search models by name, find the best model for any task — code, reasoning, creative, fast, cheap, vision, long-context. Real-time data from 200+ models. Use when choosing models, comparing costs, or auditing your AI stack.
model-brain
Route each incoming message to the right Bankr/OpenClaw model or to a zero-LLM path based on task type, risk, and cost. Use when you need per-message model selection, cost-aware routing, deterministic skill bypasses, or a model recommendation for aaigotchi workflows.
wavelet-world-model
Generates a world model representation from state inputs using discrete wavelet transforms (DWT) to capture multi-resolution temporal and spatial features.