multiAI Summary Pending

sensei

**WORKFLOW SKILL** — Iteratively improve skill frontmatter compliance using the Ralph loop pattern. WHEN: "run sensei", "sensei help", "improve skill", "fix frontmatter", "skill compliance", "frontmatter audit", "score skill", "check skill tokens". INVOKES: token counting tools, test runners, git commands. FOR SINGLE OPERATIONS: use token CLI directly for counts/checks.

29 stars

How sensei Compares

Feature / AgentsenseiStandard Approach
Platform SupportmultiLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

**WORKFLOW SKILL** — Iteratively improve skill frontmatter compliance using the Ralph loop pattern. WHEN: "run sensei", "sensei help", "improve skill", "fix frontmatter", "skill compliance", "frontmatter audit", "score skill", "check skill tokens". INVOKES: token counting tools, test runners, git commands. FOR SINGLE OPERATIONS: use token CLI directly for counts/checks.

Which AI agents support this skill?

This skill is compatible with multi.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Sensei

> "A true master teaches not by telling, but by refining."

Automates skill frontmatter improvement using the [Ralph loop pattern](https://github.com/soderlund/ralph) - iteratively improving skills until they reach Medium-High compliance with passing tests.

## Help

When user says "sensei help" or asks how to use sensei:

```
╔══════════════════════════════════════════════════════════════════╗
║  SENSEI - Skill Frontmatter Compliance Improver                  ║
╠══════════════════════════════════════════════════════════════════╣
║                                                                  ║
║  USAGE:                                                          ║
║    Run sensei on <skill-name>              # Single skill        ║
║    Run sensei on <skill-name> --fast       # Skip tests          ║
║    Run sensei on <skill1>, <skill2>        # Multiple skills     ║
║    Run sensei on all Low-adherence skills  # Batch by score      ║
║    Run sensei on all skills                # All skills          ║
║                                                                  ║
║  WHAT IT DOES:                                                   ║
║    1. READ    - Load skill's SKILL.md and count tokens           ║
║    2. SCORE   - Check compliance (Low/Medium/Medium-High/High)   ║
║    3. SCAFFOLD- Create tests from template if missing            ║
║    4. IMPROVE - Add WHEN: triggers (cross-model optimized)       ║
║    5. TEST    - Run tests, fix if needed                         ║
║    6. TOKENS  - Check token budget                               ║
║    7. SUMMARY - Show before/after comparison                     ║
║    8. PROMPT  - Ask: Commit, Create Issue, or Skip?              ║
║    9. REPEAT  - Until Medium-High score achieved                 ║
║                                                                  ║
║  TARGET SCORE: Medium-High                                       ║
║    ✓ Description > 150 chars, ≤ 60 words                         ║
║    ✓ Has "WHEN:" trigger phrases (preferred)                     ║
║    ✓ No "DO NOT USE FOR:" (risky in multi-skill envs)             ║
║    ✓ Has "INVOKES:" for tool relationships (optional)            ║
║    ✓ SKILL.md < 500 tokens (soft limit)                          ║
║                                                                  ║
║  MCP INTEGRATION (when INVOKES present):                         ║
║    ✓ Has "MCP Tools Used" table                                  ║
║    ✓ Has Prerequisites section                                   ║
║    ✓ Has CLI fallback pattern                                    ║
║    ✓ No skill-tool name collision                                ║
║                                                                  ║
╚══════════════════════════════════════════════════════════════════╝
```

## Configuration

Sensei uses these defaults (override by specifying in your prompt):

| Setting | Default | Description |
|---------|---------|-------------|
| Skills directory | `skills/` or `.github/skills/` | Where SKILL.md files live |
| Tests directory | `tests/` | Where test files live |
| Token soft limit | 500 | Target for SKILL.md |
| Token hard limit | 5000 | Maximum for SKILL.md |
| Target score | Medium-High | Minimum compliance level |
| Max iterations | 5 | Per-skill loop limit |

Auto-detect skills directory by checking (in order):
1. `skills/` in project root
2. `.github/skills/` 
3. User-specified path

## Invocation Modes

### Single Skill
```
Run sensei on my-skill-name
```

### Multiple Skills
```
Run sensei on skill-a, skill-b, skill-c
```

### By Adherence Level
```
Run sensei on all Low-adherence skills
Run sensei on all Medium-adherence skills
```

### All Skills
```
Run sensei on all skills
```

### Fast Mode (Skip Tests)
```
Run sensei on my-skill --fast
```

### GEPA Mode (Deep Optimization)
```
Run sensei on my-skill --gepa
Run sensei on my-skill --gepa --fast
Run sensei on all skills --gepa
```

When `--gepa` is used, Step 5 (IMPROVE) is replaced with GEPA evolutionary optimization.
Instead of template-based improvements, GEPA uses the existing test harness as a fitness
function and an LLM to propose and evaluate many candidate improvements automatically.

**GEPA score-only mode** (no LLM calls, just evaluate current quality):
```
Run sensei score my-skill
Run sensei score all skills
```

## The Ralph Loop

For each skill, execute this loop until score >= Medium-High:

### Step 1: READ
Load the skill's current state:
```
{skills-dir}/{skill-name}/SKILL.md
{tests-dir}/{skill-name}/ (if exists)
```

Run token count:
```bash
npm run tokens -- count {skills-dir}/{skill-name}/SKILL.md
```

### Step 2: SCORE
Assess compliance by checking the frontmatter for:
- Description length (>= 150 chars, ≤ 60 words)
- "WHEN:" trigger phrases (preferred) or "USE FOR:"
- Routing clarity ("INVOKES:", "FOR SINGLE OPERATIONS:")
- No "DO NOT USE FOR:" anti-triggers (risky in multi-skill environments)

See [references/scoring.md](references/scoring.md) for detailed criteria.

### Step 3: CHECK
If score >= Medium-High AND tests pass → go to SUMMARY step.

### Step 4: SCAFFOLD (if needed)
If `{tests-dir}/{skill-name}/` doesn't exist, create test scaffolding using templates from [references/test-templates/](references/test-templates/).

### Step 5: IMPROVE FRONTMATTER
Enhance the SKILL.md description to include:

1. **Lead with action verb** - First sentence: unique action verb + domain
2. **Trigger phrases** - "WHEN:" (preferred) or "USE FOR:" with 3-5 distinctive quoted phrases
3. Keep description under 60 words and 1024 characters

> ⚠️ **"DO NOT USE FOR:" carries context-dependent risk.** In multi-skill environments (10+ skills with overlapping domains), anti-trigger clauses introduce the very keywords that cause wrong-skill activation on Claude Sonnet and fast-pattern-matching models ([evidence](https://gist.github.com/kvenkatrajan/52e6e77f5560ca30640490b4cc65d109)). For small, isolated skill sets (1-5 skills), the risk is low. When in doubt, use positive routing with `WHEN:` and distinctive quoted phrases.

Template (cross-model optimized):
```yaml
---
name: skill-name
description: "[ACTION VERB] [UNIQUE_DOMAIN]. [One clarifying sentence]. WHEN: \"[phrase1]\", \"[phrase2]\", \"[phrase3]\", \"[phrase4]\", \"[phrase5]\"."
---
```

Template (with routing clarity for High score):
```yaml
---
name: skill-name
description: "**WORKFLOW SKILL** — [ACTION VERB] [UNIQUE_DOMAIN]. [Clarifying sentence]. WHEN: \"[phrase1]\", \"[phrase2]\", \"[phrase3]\". INVOKES: [tools/MCP servers used]. FOR SINGLE OPERATIONS: [when to bypass this skill]."
---
```

### Step 5-GEPA: IMPROVE WITH GEPA (when --gepa flag is set)
**Replaces Step 5** with automated evolutionary optimization. Step 6 (IMPROVE TESTS) still runs normally.

1. **Auto-discover test harness**: Read `{tests-dir}/{skill-name}/triggers.test.ts` and extract
   `shouldTriggerPrompts` and `shouldNotTriggerPrompts` arrays automatically.

2. **Build evaluator**: Construct a GEPA evaluator that scores candidates on:
   - Content quality (has ## Triggers, ## Rules, ## Steps, USE FOR, WHEN)
   - Frontmatter description compliance (length, trigger phrases)
   - Trigger accuracy (keywords extracted from description match test prompts correctly)

3. **Run optimization**: Call the GEPA auto-evaluator script:
   ```bash
   python scripts/src/gepa/auto_evaluator.py optimize \
     --skill {skill-name} \
     --skills-dir {skills-dir} \
     --tests-dir {tests-dir} \
     --iterations 80
   ```

4. **Review output**: GEPA produces an optimized SKILL.md body. Show the diff to the user.
   The GEPA evaluator auto-generates from existing tests — no manual configuration needed.

**Key**: GEPA wraps existing tests as its fitness function. It does NOT replace or modify tests.
The LLM proposes improved SKILL.md text, and the evaluator scores each candidate against the
same test prompts the CI already uses. Only improvements that score higher are kept.

### Step 6: IMPROVE TESTS
Update test prompts to match new frontmatter:
- `shouldTriggerPrompts` - 5+ prompts matching "WHEN:" or "USE FOR:" phrases
- `shouldNotTriggerPrompts` - 5+ prompts for unrelated topics and different-skill scenarios

### Step 7: VERIFY
Run tests (skip if `--fast` flag):
```bash
# Framework-specific command based on project
npm test -- --testPathPattern={skill-name}  # Jest
pytest tests/{skill-name}/                   # pytest
waza run tests/{skill-name}/trigger_tests.yaml  # Waza
```

### Step 8: TOKENS
Check token budget:
```bash
npm run tokens -- check {skills-dir}/{skill-name}/SKILL.md
```

Budget guidelines:
- SKILL.md: < 500 tokens (soft), < 5000 (hard)
- references/*.md: < 1000 tokens each

### Step 8b: MCP INTEGRATION (if INVOKES present)
When description contains `INVOKES:`, check:

1. **MCP Tools Used table** - Does skill body have the table?
2. **Prerequisites section** - Are requirements documented?
3. **CLI fallback** - Is there a fallback when MCP unavailable?
4. **Name collision** - Does skill name match an MCP tool?

If checks fail, add missing sections using patterns from [mcp-integration.md](references/mcp-integration.md).

### Step 9: SUMMARY
Display before/after comparison:

```
╔══════════════════════════════════════════════════════════════════╗
║  SENSEI SUMMARY: {skill-name}                                    ║
╠══════════════════════════════════════════════════════════════════╣
║  BEFORE                          AFTER                           ║
║  ──────                          ─────                           ║
║  Score: Low                      Score: Medium-High              ║
║  Tokens: 142                     Tokens: 385                     ║
║  Triggers: 0                     Triggers: 5                     ║
║  Anti-triggers: 0                Anti-triggers: 3                ║
╚══════════════════════════════════════════════════════════════════╝
```

### Step 10: PROMPT USER
Ask how to proceed:
- **[C] Commit** - Save with message `sensei: improve {skill-name} frontmatter`
- **[I] Create Issue** - Open issue with summary and suggestions
- **[S] Skip** - Discard changes, move to next skill

### Step 11: REPEAT or EXIT
- If score < Medium-High AND iterations < 5 → go to Step 2
- If iterations >= 5 → timeout, show summary, move to next skill

## Scoring Quick Reference

| Score | Requirements |
|-------|--------------|
| **Invalid** | Description > 1024 chars (exceeds spec hard limit) |
| **Low** | Description < 150 chars OR no triggers |
| **Medium** | Description >= 150 chars AND has triggers but >60 words |
| **Medium-High** | Has "WHEN:" (preferred) or "USE FOR:" with ≤60 words |
| **High** | Medium-High + routing clarity (INVOKES/FOR SINGLE OPERATIONS) |

> ⚠️ "DO NOT USE FOR:" is **risky in multi-skill environments** (10+ overlapping skills) — causes keyword contamination on fast-pattern-matching models. Safe for small, isolated skill sets. Use positive routing with `WHEN:` for cross-model safety.

### MCP Integration Score (when INVOKES present)

| Check | Status |
|-------|--------|
| MCP Tools Used table | ✓/✗ |
| Prerequisites section | ✓/✗ |
| CLI fallback pattern | ✓/✗ |
| No name collision | ✓/✗ |

See [references/scoring.md](references/scoring.md) for full criteria.
See [references/mcp-integration.md](references/mcp-integration.md) for MCP patterns.

## Frontmatter Patterns

### Skill Classification Prefix

Add a prefix to clarify the skill type:
- `**WORKFLOW SKILL**` - Multi-step orchestration
- `**UTILITY SKILL**` - Single-purpose helper
- `**ANALYSIS SKILL**` - Read-only analysis/reporting

### Routing Clarity (for High score)

When skills interact with MCP tools or other skills, add:
- `INVOKES:` - What tools/skills this skill calls
- `FOR SINGLE OPERATIONS:` - When to bypass this skill

### Quick Example

**Before (Low):**
```yaml
description: 'Process PDF files'
```

**After (High with routing, cross-model optimized):**
```yaml
description: "**WORKFLOW SKILL** — Extract, rotate, merge, and split PDF files. WHEN: \"extract PDF text\", \"rotate PDF pages\", \"merge PDFs\", \"split PDF\". INVOKES: pdf-tools MCP for extraction, file-system for I/O. FOR SINGLE OPERATIONS: Use pdf-tools MCP directly for simple extractions."
```

See [references/examples.md](references/examples.md) for more before/after transformations.

## Commit Messages

```
sensei: improve {skill-name} frontmatter
```

## Reference Documentation

- [scoring.md](references/scoring.md) - Detailed scoring criteria and algorithm
- [mcp-integration.md](references/mcp-integration.md) - MCP tool integration patterns
- [loop.md](references/loop.md) - Ralph loop workflow details
- [examples.md](references/examples.md) - Before/after transformation examples
- [configuration.md](references/configuration.md) - Project setup patterns
- [test-templates/](references/test-templates/) - Test scaffolding templates
- [test-templates/waza.md](references/test-templates/waza.md) - Waza trigger test format


## Built-in Scripts

Run `npm run tokens help` for full usage.

### Token Commands

```bash
npm run tokens count              # Count all markdown files
npm run tokens check              # Check against token limits
npm run tokens suggest            # Get optimization suggestions
npm run tokens compare            # Compare with git history
```

### GEPA Commands

Requires: `pip install gepa` (or `uv pip install gepa`). See [requirements](scripts/src/gepa/requirements.txt).

```bash
# Score a single skill (no LLM calls, instant)
python scripts/src/gepa/auto_evaluator.py score --skill azure-deploy --skills-dir skills --tests-dir tests

# Score all skills
python scripts/src/gepa/auto_evaluator.py score-all --skills-dir skills --tests-dir tests

# Optimize a skill (requires LLM API — uses GitHub Models via gh auth token)
python scripts/src/gepa/auto_evaluator.py optimize --skill azure-deploy --skills-dir skills --tests-dir tests

# JSON output (for CI pipelines)
python scripts/src/gepa/auto_evaluator.py score-all --skills-dir skills --tests-dir tests --json
```

### Configuration

Create `.token-limits.json` to customize limits:
```json
{
  "defaults": { "SKILL.md": 500, "references/**/*.md": 1000 },
  "overrides": { "README.md": 3000 }
}
```