metrics-tokens
Analyze token usage efficiency against the MetaGPT baseline and surface per-step optimization opportunities
Best use case
metrics-tokens is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
It is a strong fit for teams already working in Codex.
Analyze token usage efficiency against the MetaGPT baseline and surface per-step optimization opportunities
Teams using metrics-tokens should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/metrics-tokens/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How metrics-tokens Compares
| Feature / Agent | metrics-tokens | Standard Approach |
|---|---|---|
| Platform Support | Codex | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Analyze token usage efficiency against the MetaGPT baseline and surface per-step optimization opportunities
Which AI agents support this skill?
This skill is designed for Codex.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
SKILL.md Source
# metrics-tokens
You perform deep analysis of token usage efficiency. You compare AIWG workflow token consumption against the MetaGPT 124 tokens/line benchmark (REF-013), identify high-cost operations, and surface optimization opportunities.
## Triggers
Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description):
- "how efficient are my tokens" → efficiency ratio vs MetaGPT baseline
- "am I above the baseline" → threshold status check
- "where are tokens being wasted" → per-step breakdown with recommendations
- "token ratio" → tokens/line ratio calculation
## Trigger Patterns Reference
| Pattern | Example | Action |
|---------|---------|--------|
| Efficiency report | "token efficiency" | `aiwg metrics-tokens` |
| Session analysis | "analyze tokens for this session" | `aiwg metrics-tokens --session current` |
| Threshold check | "are we at green" | `aiwg metrics-tokens --threshold` |
| Per-step breakdown | "which step used the most tokens" | `aiwg metrics-tokens --by-step` |
| Optimization hints | "suggest token optimizations" | `aiwg metrics-tokens --optimize` |
## Behavior
When triggered:
1. **Determine scope**:
- Default: current or most recent session
- `--session <name>`: named session
- `--all`: aggregate across all sessions
2. **Load token data**:
- Read `.aiwg/ralph/sessions/*/metrics.json` for raw token counts
- Apply estimation heuristic: 4 chars per token (aligned with `src/metrics/token-counter.ts`)
3. **Compute efficiency metrics**:
- Tokens/line ratio for session output
- `vsBenchmark`: percentage vs MetaGPT 124 tokens/line (negative = better)
- `vsBaseline`: percentage vs typical LLM 200 tokens/line (negative = better)
- Threshold status: green (≤124), yellow (125–150), red (>150)
4. **Run the command**:
```bash
# Default efficiency report
aiwg metrics-tokens
# Current session
aiwg metrics-tokens --session current
# Per-step breakdown
aiwg metrics-tokens --by-step
# With optimization suggestions
aiwg metrics-tokens --optimize
# JSON output
aiwg metrics-tokens --json
```
## Benchmark Reference
The MetaGPT 124 tokens/line benchmark comes from REF-013 (research corpus). It represents a validated efficiency target for AI-assisted software workflows. AIWG tracks against this benchmark to make token costs legible and comparable across sessions.
| Threshold | Tokens/Line | Status | Action |
|-----------|-------------|--------|--------|
| At or below benchmark | ≤ 124 | green | No action needed |
| Above benchmark | 125–150 | yellow | Flag for review |
| Well above benchmark | > 150 | red | Generate optimization recommendations |
Comparison points:
| Baseline | Tokens/Line |
|----------|-------------|
| MetaGPT benchmark (REF-013) | 124 |
| Typical LLM baseline | ~200 |
| AIWG target | ≤ 124 |
## Report Format
### Standard Efficiency Report
```
Token Efficiency — Session: sdlc-review-20260401-143022
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Token Counts
Input: 42,310 tokens
Output: 18,940 tokens
Total: 61,250 tokens
Content Metrics
Characters: 245,000
Non-blank lines: 548
Total lines: 621
Efficiency
Tokens/line: 112
vs MetaGPT: -9.7% (better than 124 tokens/line benchmark)
vs LLM baseline: -44% (well below 200 tokens/line typical)
Status: green
Threshold: green — at or below MetaGPT benchmark
```
### Per-Step Breakdown (`--by-step`)
```
Token Efficiency by Step
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step Tokens Lines Tokens/Line Status
────────────────────── ──────── ───── ─────────── ──────
architecture-designer 18,200 168 108 green
security-architect 14,600 132 111 green
test-architect 13,100 119 110 green
technical-writer 15,350 129 119 green ← highest volume
──────────────────────────────────
Total 61,250 548 112 green
```
### Optimization Report (`--optimize`)
```
Optimization Suggestions
━━━━━━━━━━━━━━━━━━━━━━━━
Status: green — no critical optimizations needed.
Opportunities (optional):
1. technical-writer (119 tok/line) — near benchmark ceiling.
Consider: scope the synthesis prompt to final merge only,
avoid re-reading full drafts.
2. architecture-designer (18,200 tokens) — highest absolute cost.
Consider: pass only the relevant SAD section, not the full doc.
```
## Efficiency Calculation
Token efficiency uses the estimation and comparison logic from `src/metrics/token-counter.ts`:
```
tokens = ceil(characters / 4)
tokensPerLine = tokens / nonBlankLines
vsBenchmark = (tokensPerLine - 124) / 124 * 100 (negative = better)
vsBaseline = (tokensPerLine - 200) / 200 * 100 (negative = better)
```
## Examples
### Example 1: Quick efficiency check
**User**: "Token efficiency for this session"
**Action**:
```bash
aiwg metrics-tokens
```
**Response**: Efficiency report with tokens/line ratio, benchmark comparison, and green/yellow/red status.
### Example 2: Identify expensive steps
**User**: "Which step used the most tokens?"
**Action**:
```bash
aiwg metrics-tokens --by-step
```
**Response**: Per-step table showing token counts, line counts, tokens/line ratio, and threshold status for each workflow step.
### Example 3: Optimization pass
**User**: "Suggest ways to reduce token usage"
**Action**:
```bash
aiwg metrics-tokens --optimize
```
**Response**: Optimization suggestions targeted at steps above the green threshold, with specific prompt-scoping recommendations.
### Example 4: Are we at green?
**User**: "Are we at green on token efficiency?"
**Extraction**: Threshold check
**Action**:
```bash
aiwg metrics-tokens --threshold
```
**Response**: "Threshold status: **green** — 112 tokens/line, 9.7% below the MetaGPT 124 tokens/line benchmark (REF-013)."
## Clarification Prompts
If the session scope is unclear:
- "Should I analyze the current running session or the most recent completed session?"
## References
- @$AIWG_ROOT/src/cli/handlers/subcommands.ts — Metrics tokens handler
- @$AIWG_ROOT/src/metrics/token-counter.ts — Token counting, MetaGPT baseline constants (REF-013)
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/flows/token-efficiency.yaml — Token efficiency schema
- @$AIWG_ROOT/docs/cli-reference.md — CLI referenceRelated Skills
regression-metrics
Track and analyze regression statistics, trends, hotspots, and health indicators across test suites
aiwg-orchestrate
Route structured artifact work to AIWG workflows via MCP with zero parent context cost
venv-manager
Create, manage, and validate Python virtual environments. Use for project isolation and dependency management.
pytest-runner
Execute Python tests with pytest, supporting fixtures, markers, coverage, and parallel execution. Use for Python test automation.
vitest-runner
Execute JavaScript/TypeScript tests with Vitest, supporting coverage, watch mode, and parallel execution. Use for JS/TS test automation.
eslint-checker
Run ESLint for JavaScript/TypeScript code quality and style enforcement. Use for static analysis and auto-fixing.
repo-analyzer
Analyze GitHub repositories for structure, documentation, dependencies, and contribution patterns. Use for codebase understanding and health assessment.
pr-reviewer
Review GitHub pull requests for code quality, security, and best practices. Use for automated PR feedback and approval workflows.
YouTube Acquisition
yt-dlp patterns for acquiring content from YouTube and video platforms
Quality Filtering
Accept/reject logic and quality scoring heuristics for media content
Provenance Tracking
W3C PROV-O patterns for tracking media derivation chains and production history
Metadata Tagging
opustags and ffmpeg patterns for applying metadata to audio and video files