token-saver-75plus
Always-on token optimization + model routing protocol. Auto-classifies requests (T1-T4), routes execution to the cheapest capable model via sessions_spawn, and applies maximum output compression. Target: 75%+ token savings.
Best use case
token-saver-75plus is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Always-on token optimization + model routing protocol. Auto-classifies requests (T1-T4), routes execution to the cheapest capable model via sessions_spawn, and applies maximum output compression. Target: 75%+ token savings.
Teams using token-saver-75plus should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/token-saver-75plus/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How token-saver-75plus Compares
| Feature / Agent | token-saver-75plus | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Always-on token optimization + model routing protocol. Auto-classifies requests (T1-T4), routes execution to the cheapest capable model via sessions_spawn, and applies maximum output compression. Target: 75%+ token savings.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Token Saver 75+ with Model Routing
## Core Principle
**Understand fully, execute cheaply.** The orchestrator must fully understand the task before routing. Never sacrifice comprehension for speed.
## Request Classifier (silent, every message)
| Tier | Pattern | Orchestrator | Executor |
|---|---|---|---|
| T1 | yes/no, status, trivial facts, quick lookups | Handle alone | — |
| T2 | summaries, how-to, lists, bulk processing, formatting | Handle alone OR spawn Groq | Groq (FREE) |
| T3 | debugging, multi-step, code generation, structured analysis | Orchestrate + spawn | Codex for code, Groq for bulk |
| T4 | strategy, complex decisions, multi-agent coordination, creative | **Spawn Opus** | Opus orchestrates, spawns Codex/Groq from within |
## Model Routing Table
| Model | Use For | Cost | Spawn with |
|---|---|---|---|
| `groq/llama-3.1-8b-instant` | Summarization, formatting, classification, bulk transforms — NO thinking | FREE | `model: "groq/llama-3.1-8b-instant"` |
| `openai/gpt-5.3-codex` | ALL code generation, code review, refactoring | $$$ | `model: "openai/gpt-5.3-codex"` |
| `openai/gpt-5.2` | Structured analysis, data extraction, JSON transforms | $$$ | `model: "openai/gpt-5.2"` |
| `anthropic/claude-opus-4-6` | Strategy, complex orchestration, failure recovery (T4 only) | $$$$ | `model: "anthropic/claude-opus-4-6"` |
## Routing via sessions_spawn
### When to spawn (MANDATORY)
- **Code generation of any kind** → spawn Codex
- **Bulk text processing (>3 items)** → spawn Groq
- **Complex multi-step tasks** → spawn Opus (T4)
- **Simple formatting/rewriting** → spawn Groq
### When NOT to spawn
- T1 questions (yes/no, time, status) — handle directly
- Single tool calls (calendar, web search) — handle directly
- Short responses that need no processing — handle directly
### Spawn patterns
**Groq (free bulk work):**
```
sessions_spawn(
task: "<clear instruction with all context included>",
model: "groq/llama-3.1-8b-instant"
)
```
**Codex (all code):**
```
sessions_spawn(
task: "Write <language> code that <detailed spec>. Include comments. Output the complete file.",
model: "openai/gpt-5.3-codex"
)
```
**Opus (T4 strategy):**
```
sessions_spawn(
task: "<full context + goal>. You have full tool access. Use sessions_spawn with Codex for code and Groq for bulk subtasks.",
model: "anthropic/claude-opus-4-6"
)
```
### Critical spawn rules
1. **Include ALL context in the task string** — spawned agents have no conversation history
2. **Be specific** — vague tasks waste tokens on clarification
3. **One task per spawn** — don't bundle unrelated work
4. **For code: always use Codex** — never write code yourself
## Output Compression (applies to ALL tiers, ALL models)
### Templates
- **STATUS:** OK/WARN/FAIL one-liner
- **CHOICE:** A vs B → Recommend: X (1 line why)
- **CAUSE→FIX→VERIFY:** 3 bullets max
- **RESULT:** data/output directly, no wrap-up
### Rules
- No filler. No restating the question. Lead with the answer.
- Bullets/tables/code > prose.
- Do not narrate routine tool calls.
- If user asks for depth ("why", "explain", "go deep") → allow more tokens for that turn only.
### Budget by tier
| Tier | Max output |
|---|---|
| T1 | 1-3 lines |
| T2 | 5-15 bullets |
| T3 | Structured sections, <400 words |
| T4 | Longer allowed, still dense |
## Tool Gating (before ANY tool call)
1. Already known? → No tool.
2. Batchable? → Parallelize.
3. Can a spawned Groq handle it? → Spawn instead of doing it yourself.
4. Cheapest path? → memory_search > partial read > full read > web.
5. Needed? → Do not fetch "just in case."
## Failure Protocol
- If Groq spawn fails → retry with GPT-5.2
- If Codex spawn fails → retry with GPT-5.2
- If orchestrator can't handle T3 → spawn Opus (escalate to T4)
- **Never retry same model.** Escalate.
## Measurement (when asked or during testing)
Append: `[~X tokens | Tier: Tn | Route: model(s) used]`Related Skills
anthropic-token-refresh
Automatically refresh Anthropic Claude setup-token before expiration using browser automation. Use when: (1) Setting up auto token refresh for Claude Max/Pro subscription, (2) Token keeps expiring and causing OpenClaw to stop responding, (3) Want to maintain continuous Claude API access without manual intervention.
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
moai-lang-r
R 4.4+ best practices with testthat 3.2, lintr 3.2, and data analysis patterns.
moai-lang-python
Python 3.13+ development specialist covering FastAPI, Django, async patterns, data science, testing with pytest, and modern Python features. Use when developing Python APIs, web applications, data pipelines, or writing tests.
moai-icons-vector
Vector icon libraries ecosystem guide covering 10+ major libraries with 200K+ icons, including React Icons (35K+), Lucide (1000+), Tabler Icons (5900+), Iconify (200K+), Heroicons, Phosphor, and Radix Icons with implementation patterns, decision trees, and best practices.
moai-foundation-trust
Complete TRUST 4 principles guide covering Test First, Readable, Unified, Secured. Validation methods, enterprise quality gates, metrics, and November 2025 standards. Enterprise v4.0 with 50+ software quality standards references.
moai-foundation-memory
Persistent memory across sessions using MCP Memory Server for user preferences, project context, and learned patterns
moai-foundation-core
MoAI-ADK's foundational principles - TRUST 5, SPEC-First TDD, delegation patterns, token optimization, progressive disclosure, modular architecture, agent catalog, command reference, and execution rules for building AI-powered development workflows
moai-cc-claude-md
Authoring CLAUDE.md Project Instructions. Design project-specific AI guidance, document workflows, define architecture patterns. Use when creating CLAUDE.md files for projects, documenting team standards, or establishing AI collaboration guidelines.
moai-alfred-language-detection
Auto-detects project language and framework from package.json, pyproject.toml, etc.
mnemonic
Unified memory system - aggregates communications and AI sessions across all channels into searchable, analyzable memory
mlops
MLflow, model versioning, experiment tracking, model registry, and production ML systems