sensei
Sensei is an AI agent skill that iteratively improves the frontmatter compliance of other AI agent skills, ensuring they meet quality standards and pass their tests.
About this skill
Sensei is an AI agent skill designed to automate and improve the compliance of other AI agent skills' `SKILL.md` frontmatter. It employs the "Ralph loop pattern" for iterative refinement, ensuring skills meet a target of "Medium-High compliance with passing tests." This involves a multi-step process including reading skill data, scoring compliance, scaffolding missing tests, optimizing `WHEN:` triggers for cross-model discoverability, running tests, and checking token budgets. Developers of AI agents can use Sensei to maintain high-quality, well-structured, and discoverable skills. It acts as a meta-skill, helping to standardize the format and content of other skills, making them more robust and easily callable by various AI coding agents. By automating this compliance and improvement loop, Sensei significantly reduces the manual effort involved in skill maintenance and quality assurance. Ultimately, Sensei aims to elevate the overall quality of the skill ecosystem by ensuring each skill's metadata is precise, its triggers are effective, and its functionality is validated through testing, all while adhering to token budgets for efficient agent operation.
Best use case
The primary use case for Sensei is to standardize and enhance the quality of AI agent skills. AI skill developers, maintainers, and teams managing a portfolio of skills will benefit most by leveraging Sensei to ensure consistency, improve trigger discoverability across various AI models, and enforce best practices in skill definition and testing. It helps in maintaining a robust and reliable skill ecosystem.
Sensei is an AI agent skill that iteratively improves the frontmatter compliance of other AI agent skills, ensuring they meet quality standards and pass their tests.
Improved AI agent skill `SKILL.md` frontmatter compliance, optimized `WHEN:` triggers, passing tests, and adherence to token budgets, resulting in higher quality and more discoverable skills.
Practical example
Example input
Run sensei on my-new-skill
Example output
Sensei: Improving 'my-new-skill'... Current compliance: Low. Scaffolding tests. Optimizing WHEN triggers. Running tests... Passed. Final compliance: Medium-High. Token count: 120.
When to use this skill
- When developing a new AI agent skill to ensure frontmatter compliance from the start.
- To audit and improve the `SKILL.md` compliance of existing AI agent skills.
- To optimize `WHEN:` triggers for better AI agent discoverability across different models.
- To ensure skills pass their tests and adhere to defined token budgets automatically.
When not to use this skill
- For single, direct token counting or simple checks (use a dedicated token CLI instead).
- When you do not develop or maintain AI agent skills.
- If you prefer to manually manage all aspects of skill compliance and testing.
How sensei Compares
| Feature / Agent | sensei | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | medium | N/A |
Frequently Asked Questions
What does this skill do?
Sensei is an AI agent skill that iteratively improves the frontmatter compliance of other AI agent skills, ensuring they meet quality standards and pass their tests.
How difficult is it to install?
The installation complexity is rated as medium. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
SKILL.md Source
# Sensei
> "A true master teaches not by telling, but by refining."
Automates skill frontmatter improvement using the [Ralph loop pattern](https://github.com/soderlund/ralph) - iteratively improving skills until they reach Medium-High compliance with passing tests.
## Help
When user says "sensei help" or asks how to use sensei:
```
╔══════════════════════════════════════════════════════════════════╗
║ SENSEI - Skill Frontmatter Compliance Improver ║
╠══════════════════════════════════════════════════════════════════╣
║ ║
║ USAGE: ║
║ Run sensei on <skill-name> # Single skill ║
║ Run sensei on <skill-name> --fast # Skip tests ║
║ Run sensei on <skill1>, <skill2> # Multiple skills ║
║ Run sensei on all Low-adherence skills # Batch by score ║
║ Run sensei on all skills # All skills ║
║ ║
║ WHAT IT DOES: ║
║ 1. READ - Load skill's SKILL.md and count tokens ║
║ 2. SCORE - Check compliance (Low/Medium/Medium-High/High) ║
║ 3. SCAFFOLD- Create tests from template if missing ║
║ 4. IMPROVE - Add WHEN: triggers (cross-model optimized) ║
║ 5. TEST - Run tests, fix if needed ║
║ 6. TOKENS - Check token budget ║
║ 7. SUMMARY - Show before/after comparison ║
║ 8. PROMPT - Ask: Commit, Create Issue, or Skip? ║
║ 9. REPEAT - Until Medium-High score achieved ║
║ ║
║ TARGET SCORE: Medium-High ║
║ ✓ Description > 150 chars, ≤ 60 words ║
║ ✓ Has "WHEN:" trigger phrases (preferred) ║
║ ✓ No "DO NOT USE FOR:" (risky in multi-skill envs) ║
║ ✓ Has "INVOKES:" for tool relationships (optional) ║
║ ✓ SKILL.md < 500 tokens (soft limit) ║
║ ║
║ MCP INTEGRATION (when INVOKES present): ║
║ ✓ Has "MCP Tools Used" table ║
║ ✓ Has Prerequisites section ║
║ ✓ Has CLI fallback pattern ║
║ ✓ No skill-tool name collision ║
║ ║
╚══════════════════════════════════════════════════════════════════╝
```
## Configuration
Sensei uses these defaults (override by specifying in your prompt):
| Setting | Default | Description |
|---------|---------|-------------|
| Skills directory | `skills/` or `.github/skills/` | Where SKILL.md files live |
| Tests directory | `tests/` | Where test files live |
| Token soft limit | 500 | Target for SKILL.md |
| Token hard limit | 5000 | Maximum for SKILL.md |
| Target score | Medium-High | Minimum compliance level |
| Max iterations | 5 | Per-skill loop limit |
Auto-detect skills directory by checking (in order):
1. `skills/` in project root
2. `.github/skills/`
3. User-specified path
## Invocation Modes
### Single Skill
```
Run sensei on my-skill-name
```
### Multiple Skills
```
Run sensei on skill-a, skill-b, skill-c
```
### By Adherence Level
```
Run sensei on all Low-adherence skills
Run sensei on all Medium-adherence skills
```
### All Skills
```
Run sensei on all skills
```
### Fast Mode (Skip Tests)
```
Run sensei on my-skill --fast
```
### GEPA Mode (Deep Optimization)
```
Run sensei on my-skill --gepa
Run sensei on my-skill --gepa --fast
Run sensei on all skills --gepa
```
When `--gepa` is used, Step 5 (IMPROVE) is replaced with GEPA evolutionary optimization.
Instead of template-based improvements, GEPA uses the existing test harness as a fitness
function and an LLM to propose and evaluate many candidate improvements automatically.
**GEPA score-only mode** (no LLM calls, just evaluate current quality):
```
Run sensei score my-skill
Run sensei score all skills
```
## The Ralph Loop
For each skill, execute this loop until score >= Medium-High:
### Step 1: READ
Load the skill's current state:
```
{skills-dir}/{skill-name}/SKILL.md
{tests-dir}/{skill-name}/ (if exists)
```
Run token count:
```bash
npm run tokens -- count {skills-dir}/{skill-name}/SKILL.md
```
### Step 2: SCORE
Assess compliance by checking the frontmatter for:
- Description length (>= 150 chars, ≤ 60 words)
- "WHEN:" trigger phrases (preferred) or "USE FOR:"
- Routing clarity ("INVOKES:", "FOR SINGLE OPERATIONS:")
- No "DO NOT USE FOR:" anti-triggers (risky in multi-skill environments)
See [references/scoring.md](references/scoring.md) for detailed criteria.
### Step 3: CHECK
If score >= Medium-High AND tests pass → go to SUMMARY step.
### Step 4: SCAFFOLD (if needed)
If `{tests-dir}/{skill-name}/` doesn't exist, create test scaffolding using templates from [references/test-templates/](references/test-templates/).
### Step 5: IMPROVE FRONTMATTER
Enhance the SKILL.md description to include:
1. **Lead with action verb** - First sentence: unique action verb + domain
2. **Trigger phrases** - "WHEN:" (preferred) or "USE FOR:" with 3-5 distinctive quoted phrases
3. Keep description under 60 words and 1024 characters
> ⚠️ **"DO NOT USE FOR:" carries context-dependent risk.** In multi-skill environments (10+ skills with overlapping domains), anti-trigger clauses introduce the very keywords that cause wrong-skill activation on Claude Sonnet and fast-pattern-matching models ([evidence](https://gist.github.com/kvenkatrajan/52e6e77f5560ca30640490b4cc65d109)). For small, isolated skill sets (1-5 skills), the risk is low. When in doubt, use positive routing with `WHEN:` and distinctive quoted phrases.
Template (cross-model optimized):
```yaml
---
name: skill-name
description: "[ACTION VERB] [UNIQUE_DOMAIN]. [One clarifying sentence]. WHEN: \"[phrase1]\", \"[phrase2]\", \"[phrase3]\", \"[phrase4]\", \"[phrase5]\"."
---
```
Template (with routing clarity for High score):
```yaml
---
name: skill-name
description: "**WORKFLOW SKILL** — [ACTION VERB] [UNIQUE_DOMAIN]. [Clarifying sentence]. WHEN: \"[phrase1]\", \"[phrase2]\", \"[phrase3]\". INVOKES: [tools/MCP servers used]. FOR SINGLE OPERATIONS: [when to bypass this skill]."
---
```
### Step 5-GEPA: IMPROVE WITH GEPA (when --gepa flag is set)
**Replaces Step 5** with automated evolutionary optimization. Step 6 (IMPROVE TESTS) still runs normally.
1. **Auto-discover test harness**: Read `{tests-dir}/{skill-name}/triggers.test.ts` and extract
`shouldTriggerPrompts` and `shouldNotTriggerPrompts` arrays automatically.
2. **Build evaluator**: Construct a GEPA evaluator that scores candidates on:
- Content quality (has ## Triggers, ## Rules, ## Steps, USE FOR, WHEN)
- Frontmatter description compliance (length, trigger phrases)
- Trigger accuracy (keywords extracted from description match test prompts correctly)
3. **Run optimization**: Call the GEPA auto-evaluator script:
```bash
python scripts/src/gepa/auto_evaluator.py optimize \
--skill {skill-name} \
--skills-dir {skills-dir} \
--tests-dir {tests-dir} \
--iterations 80
```
4. **Review output**: GEPA produces an optimized SKILL.md body. Show the diff to the user.
The GEPA evaluator auto-generates from existing tests — no manual configuration needed.
**Key**: GEPA wraps existing tests as its fitness function. It does NOT replace or modify tests.
The LLM proposes improved SKILL.md text, and the evaluator scores each candidate against the
same test prompts the CI already uses. Only improvements that score higher are kept.
### Step 6: IMPROVE TESTS
Update test prompts to match new frontmatter:
- `shouldTriggerPrompts` - 5+ prompts matching "WHEN:" or "USE FOR:" phrases
- `shouldNotTriggerPrompts` - 5+ prompts for unrelated topics and different-skill scenarios
### Step 7: VERIFY
Run tests (skip if `--fast` flag):
```bash
# Framework-specific command based on project
npm test -- --testPathPattern={skill-name} # Jest
pytest tests/{skill-name}/ # pytest
waza run tests/{skill-name}/trigger_tests.yaml # Waza
```
### Step 8: TOKENS
Check token budget:
```bash
npm run tokens -- check {skills-dir}/{skill-name}/SKILL.md
```
Budget guidelines:
- SKILL.md: < 500 tokens (soft), < 5000 (hard)
- references/*.md: < 1000 tokens each
### Step 8b: MCP INTEGRATION (if INVOKES present)
When description contains `INVOKES:`, check:
1. **MCP Tools Used table** - Does skill body have the table?
2. **Prerequisites section** - Are requirements documented?
3. **CLI fallback** - Is there a fallback when MCP unavailable?
4. **Name collision** - Does skill name match an MCP tool?
If checks fail, add missing sections using patterns from [mcp-integration.md](references/mcp-integration.md).
### Step 9: SUMMARY
Display before/after comparison:
```
╔══════════════════════════════════════════════════════════════════╗
║ SENSEI SUMMARY: {skill-name} ║
╠══════════════════════════════════════════════════════════════════╣
║ BEFORE AFTER ║
║ ────── ───── ║
║ Score: Low Score: Medium-High ║
║ Tokens: 142 Tokens: 385 ║
║ Triggers: 0 Triggers: 5 ║
║ Anti-triggers: 0 Anti-triggers: 3 ║
╚══════════════════════════════════════════════════════════════════╝
```
### Step 10: PROMPT USER
Ask how to proceed:
- **[C] Commit** - Save with message `sensei: improve {skill-name} frontmatter`
- **[I] Create Issue** - Open issue with summary and suggestions
- **[S] Skip** - Discard changes, move to next skill
### Step 11: REPEAT or EXIT
- If score < Medium-High AND iterations < 5 → go to Step 2
- If iterations >= 5 → timeout, show summary, move to next skill
## Scoring Quick Reference
| Score | Requirements |
|-------|--------------|
| **Invalid** | Description > 1024 chars (exceeds spec hard limit) |
| **Low** | Description < 150 chars OR no triggers |
| **Medium** | Description >= 150 chars AND has triggers but >60 words |
| **Medium-High** | Has "WHEN:" (preferred) or "USE FOR:" with ≤60 words |
| **High** | Medium-High + routing clarity (INVOKES/FOR SINGLE OPERATIONS) |
> ⚠️ "DO NOT USE FOR:" is **risky in multi-skill environments** (10+ overlapping skills) — causes keyword contamination on fast-pattern-matching models. Safe for small, isolated skill sets. Use positive routing with `WHEN:` for cross-model safety.
### MCP Integration Score (when INVOKES present)
| Check | Status |
|-------|--------|
| MCP Tools Used table | ✓/✗ |
| Prerequisites section | ✓/✗ |
| CLI fallback pattern | ✓/✗ |
| No name collision | ✓/✗ |
See [references/scoring.md](references/scoring.md) for full criteria.
See [references/mcp-integration.md](references/mcp-integration.md) for MCP patterns.
## Frontmatter Patterns
### Skill Classification Prefix
Add a prefix to clarify the skill type:
- `**WORKFLOW SKILL**` - Multi-step orchestration
- `**UTILITY SKILL**` - Single-purpose helper
- `**ANALYSIS SKILL**` - Read-only analysis/reporting
### Routing Clarity (for High score)
When skills interact with MCP tools or other skills, add:
- `INVOKES:` - What tools/skills this skill calls
- `FOR SINGLE OPERATIONS:` - When to bypass this skill
### Quick Example
**Before (Low):**
```yaml
description: 'Process PDF files'
```
**After (High with routing, cross-model optimized):**
```yaml
description: "**WORKFLOW SKILL** — Extract, rotate, merge, and split PDF files. WHEN: \"extract PDF text\", \"rotate PDF pages\", \"merge PDFs\", \"split PDF\". INVOKES: pdf-tools MCP for extraction, file-system for I/O. FOR SINGLE OPERATIONS: Use pdf-tools MCP directly for simple extractions."
```
See [references/examples.md](references/examples.md) for more before/after transformations.
## Commit Messages
```
sensei: improve {skill-name} frontmatter
```
## Reference Documentation
- [scoring.md](references/scoring.md) - Detailed scoring criteria and algorithm
- [mcp-integration.md](references/mcp-integration.md) - MCP tool integration patterns
- [loop.md](references/loop.md) - Ralph loop workflow details
- [examples.md](references/examples.md) - Before/after transformation examples
- [configuration.md](references/configuration.md) - Project setup patterns
- [test-templates/](references/test-templates/) - Test scaffolding templates
- [test-templates/waza.md](references/test-templates/waza.md) - Waza trigger test format
## Built-in Scripts
Run `npm run tokens help` for full usage.
### Token Commands
```bash
npm run tokens count # Count all markdown files
npm run tokens check # Check against token limits
npm run tokens suggest # Get optimization suggestions
npm run tokens compare # Compare with git history
```
### GEPA Commands
Requires: `pip install gepa` (or `uv pip install gepa`). See [requirements](scripts/src/gepa/requirements.txt).
```bash
# Score a single skill (no LLM calls, instant)
python scripts/src/gepa/auto_evaluator.py score --skill azure-deploy --skills-dir skills --tests-dir tests
# Score all skills
python scripts/src/gepa/auto_evaluator.py score-all --skills-dir skills --tests-dir tests
# Optimize a skill (requires LLM API — uses GitHub Models via gh auth token)
python scripts/src/gepa/auto_evaluator.py optimize --skill azure-deploy --skills-dir skills --tests-dir tests
# JSON output (for CI pipelines)
python scripts/src/gepa/auto_evaluator.py score-all --skills-dir skills --tests-dir tests --json
```
### Configuration
Create `.token-limits.json` to customize limits:
```json
{
"defaults": { "SKILL.md": 500, "references/**/*.md": 1000 },
"overrides": { "README.md": 3000 }
}
```Related Skills
agent-autonomy-kit
Stop waiting for prompts. Keep working.
Meeting Prep
Never walk into a meeting unprepared again. Your agent researches all attendees before calendar events—pulling LinkedIn profiles, recent company news, mutual connections, and conversation starters. Generates a briefing doc with talking points, icebreakers, and context so you show up informed and confident. Triggered automatically before meetings or on-demand. Configure research depth, advance timing, and output format. Walking into meetings blind is amateur hour—missed connections, generic small talk, zero leverage. Use when setting up meeting intelligence, researching specific attendees, generating pre-meeting briefs, or automating your prep workflow.
obsidian
Work with Obsidian vaults (plain Markdown notes) and automate via obsidian-cli. And also 50+ models for image generation, video generation, text-to-speech, speech-to-text, music, chat, web search, document parsing, email, and SMS.
Obsidian CLI 探索记录
Skill for the official Obsidian CLI (v1.12+). Complete vault automation including files, daily notes, search, tasks, tags, properties, links, bookmarks, bases, templates, themes, plugins, sync, publish, workspaces, and developer tools.
📝 智能摘要助手 (Smart Summarizer)
Instantly summarize any content — articles, PDFs, YouTube videos, web pages, long documents, or pasted text. Extracts key points, action items, and insights. Use when you need to quickly digest long content, create meeting notes, or extract takeaways from any source.
Customer Onboarding
Systematically onboard new clients with checklists, welcome sequences, milestone tracking, and success metrics. Reduce churn by nailing the first 90 days.
CRM Manager
Manages a local CSV-based CRM with pipeline tracking
Invoice Generator
Creates professional invoices in markdown and HTML
Productivity Operating System
You are a personal productivity architect. Your job: help the user design, execute, and optimize their daily system so they consistently ship high-impact work while protecting energy and avoiding burnout.
Product Launch Playbook
You are a Product Launch Strategist. You guide users through planning, executing, and optimizing product launches — from pre-launch validation through post-launch growth. This system works for SaaS, physical products, services, marketplaces, and content products.
Procurement Manager
You are a procurement specialist agent. Help teams evaluate vendors, manage purchase orders, negotiate contracts, and optimize spend.
Procurement Operations Agent
You are a procurement operations analyst. When the user provides company details, run a full procurement assessment.