skill-forge-review

Audit and validate existing Claude Code skills for quality, triggering accuracy, structure compliance, and best practices. Scores skills on a 0-100 scale and provides prioritized improvement recommendations. Use when user says "review skill", "audit skill", "check skill", "validate skill", or "skill quality".

39 stars

Best use case

skill-forge-review is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Audit and validate existing Claude Code skills for quality, triggering accuracy, structure compliance, and best practices. Scores skills on a 0-100 scale and provides prioritized improvement recommendations. Use when user says "review skill", "audit skill", "check skill", "validate skill", or "skill quality".

Teams using skill-forge-review should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/skill-forge-review/SKILL.md --create-dirs "https://raw.githubusercontent.com/AgriciDaniel/skill-forge/main/skills/skill-forge-review/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/skill-forge-review/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How skill-forge-review Compares

Feature / Agentskill-forge-reviewStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Audit and validate existing Claude Code skills for quality, triggering accuracy, structure compliance, and best practices. Scores skills on a 0-100 scale and provides prioritized improvement recommendations. Use when user says "review skill", "audit skill", "check skill", "validate skill", or "skill quality".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Skill Review & Validation

## Process

### Step 1: Locate Skill Files

Accept input as:
- Path to a skill directory
- Skill name (search in `~/.claude/skills/`)
- URL to a GitHub repository

Read all `.md` files, scripts, and asset files.

### Step 2: Structure Validation

Run `python scripts/validate_skill.py <path>` for programmatic checks.

Manual verification:
- [ ] SKILL.md exists (exact case)
- [ ] No README.md inside skill folder
- [ ] Folder name matches `name` field
- [ ] Valid kebab-case naming (1-64 chars)
- [ ] No "claude" or "anthropic" in name

### Step 3: Frontmatter Audit

| Check | Pass Criteria |
|-------|--------------|
| Name format | kebab-case, 1-64 chars, no leading/trailing hyphens |
| Description present | Non-empty, 1-1024 characters |
| Description has WHAT | Explains capabilities |
| Description has WHEN | Includes trigger phrases |
| Description has keywords | Domain-specific terms included |
| No XML tags | No < or > characters |
| Optional fields valid | license, compatibility (<500 chars), metadata |

### Step 4: Triggering Analysis

Assess the description for activation quality:

**Under-triggering risks:**
- Too generic ("Helps with projects")
- Missing common paraphrases
- No domain keywords
- Missing file type mentions (if relevant)

**Over-triggering risks:**
- Too broad ("Processes documents")
- Overlaps with built-in Claude capabilities
- Missing negative triggers for disambiguation

**Generate test queries:**
- 5 queries that SHOULD trigger the skill
- 5 queries that SHOULD NOT trigger
- 3 edge cases (ambiguous queries)

### Step 5: Instruction Quality

| Criterion | Score (0-10) |
|-----------|-------------|
| Specificity | Are instructions actionable? (not "validate properly") |
| Completeness | All workflows covered? |
| Error handling | Common failures addressed? |
| Examples | Concrete examples provided? |
| Progressive disclosure | Detailed docs in references/ not SKILL.md? |
| Length | Under 500 lines / 5000 tokens? |
| Cross-references | Clear links to references/scripts? |

### Step 6: Architecture Review (Multi-skill)

For skills with sub-skills:
- [ ] Main skill has clear routing table
- [ ] Sub-skills have focused responsibilities
- [ ] Cross-references are valid (files exist)
- [ ] Naming follows `parent-child` convention
- [ ] Shared references in parent, not duplicated
- [ ] Agents have clear roles (if Tier 4)

### Step 7: Script Quality (if present)

- [ ] Docstrings with purpose, input, output
- [ ] CLI interface (argparse or similar)
- [ ] Structured output (JSON)
- [ ] Error handling (try/except with clear messages)
- [ ] No hardcoded paths or secrets
- [ ] Minimal dependencies

### Step 8: Generate Skill Health Score

**Scoring methodology (0-100):**

| Category | Weight | Checks |
|----------|--------|--------|
| Frontmatter Quality | 25% | Name, description, format |
| Trigger Accuracy | 20% | WHAT + WHEN + keywords |
| Instruction Quality | 25% | Specificity, completeness, examples |
| Structure Compliance | 15% | File naming, organization, references |
| Script Quality | 10% | If applicable (full marks if no scripts needed) |
| Progressive Disclosure | 5% | Proper use of 3-level system |

### Step 9: Generate Trigger Eval Set

After reviewing, generate a structured trigger eval set for ongoing testing:

1. Run `python scripts/generate_eval_set.py <path>` to auto-generate a starter set
2. Review and refine the generated queries:
   - Ensure 8-10 should-trigger queries cover different phrasings and edge cases
   - Ensure 8-10 should-not-trigger queries are near-misses (not obviously irrelevant)
   - Include casual speech, typos, and uncommon domain uses in should-trigger set
3. Save the eval set to `evals/evals.json` in the skill directory

**Good queries** are realistic and specific (include file paths, context, domain details).
**Bad queries** are overly generic ("format this data") or obviously irrelevant.

4. Run `python scripts/optimize_description.py <path> --eval-set evals/evals.json`
   to score the current description and get improvement suggestions
5. Recommend running `/skill-forge eval <path>` for full functional evaluation

### Step 10: Generate Report

```markdown
# Skill Review: [name]

## Health Score: [X]/100

## Summary
[2-3 sentence assessment]

## Scores by Category
| Category | Score | Notes |
|----------|-------|-------|
| Frontmatter | X/25 | [issues] |
| Triggering | X/20 | [issues] |
| Instructions | X/25 | [issues] |
| Structure | X/15 | [issues] |
| Scripts | X/10 | [issues] |
| Disclosure | X/5 | [issues] |

## Critical Issues (fix immediately)
- [issue 1]
- [issue 2]

## High Priority (fix within 1 week)
- [issue 1]

## Recommendations
- [suggestion 1]
- [suggestion 2]

## Suggested Test Queries
### Should Trigger
1. [query]
2. [query]
3. [query]

### Should NOT Trigger
1. [query]
2. [query]
3. [query]
```

Related Skills

skill-forge-publish

39
from AgriciDaniel/skill-forge

Package and distribute Claude Code skills for sharing via GitHub, Claude.ai uploads, or team deployment. Creates install scripts, documentation, and .skill packages. Use when user says "publish skill", "share skill", "package skill", "distribute skill", or "release skill".

skill-forge-plan

39
from AgriciDaniel/skill-forge

Architecture and design planning for new Claude Code skills. Guides through use case definition, complexity tier selection, sub-skill decomposition, and file structure planning. Use when user says "plan skill", "design skill", "skill architecture", or "skill planning".

skill-forge-evolve

39
from AgriciDaniel/skill-forge

Improve and iterate on existing Claude Code skills based on usage feedback, test results, or changing requirements. Handles under/over-triggering fixes, instruction refinement, new sub-skill addition, and architecture evolution. Use when user says "improve skill", "fix skill", "skill not triggering", "skill triggers too much", "update skill", or "evolve skill".

skill-forge-eval

39
from AgriciDaniel/skill-forge

Run evaluation pipelines on Claude Code skills to test triggering accuracy, workflow correctness, and output quality. Spawns executor, grader, comparator, and analyzer sub-agents for parallel evaluation. Generates eval_metadata.json, grading.json, and feedback reports. Use when user says "eval skill", "test skill", "run evals", "evaluate skill", "skill evals", "test skill quality", "run skill tests", or "skill evaluation".

skill-forge-convert

39
from AgriciDaniel/skill-forge

Convert Claude Code skills to work on OpenAI Codex, Google Gemini CLI, Google Antigravity, and Cursor. Analyzes platform-specific features, generates target files (openai.yaml, AGENTS.md, GEMINI.md, .mdc rules), adapts frontmatter, converts MCP config, and produces compatibility reports. Use when user says "convert skill", "port skill", "multi-platform", "skill for codex", "skill for gemini", "skill for antigravity", "skill for cursor", "cross-platform skill", "convert to codex", "convert to gemini", "convert to antigravity", or "convert to cursor".

skill-forge-build

39
from AgriciDaniel/skill-forge

Scaffold and build Claude Code skills from plans or descriptions. Generates SKILL.md files, sub-skills, scripts, references, agents, and templates following the Agent Skills standard. Use when user says "build skill", "scaffold skill", "generate skill", "create SKILL.md", or "implement skill".

skill-forge-benchmark

39
from AgriciDaniel/skill-forge

Benchmark Claude Code skill performance with variance analysis, tracking pass rate, execution time, and token usage across iterations. Runs multiple trials per eval for statistical reliability, aggregates results into benchmark.json, and generates comparison reports between skill versions. Use when user says "benchmark skill", "measure skill performance", "skill metrics", "compare skill versions", "skill performance", "track skill improvement", "skill regression test", or "skill A/B test".

skill-forge

39
from AgriciDaniel/skill-forge

Ultimate Claude Code skill creator and architect. Designs, scaffolds, builds, reviews, evolves, and publishes production-grade Claude Code skills following the Agent Skills open standard and 3-layer architecture (directive, orchestration, execution). Handles single-file skills, multi-skill orchestrators with sub-skills and subagents, MCP-enhanced workflows, and full skill ecosystems. Industry detection for skill domain. Triggers on: "create skill", "build skill", "new skill", "skill creator", "skill builder", "skill-forge", "design skill", "scaffold skill", "review skill", "improve skill", "publish skill", "skill architecture", "convert skill", "port skill", "multi-platform", "cross-platform", "eval skill", "test skill", "benchmark skill", "skill evals", "measure skill", "skill performance", "skill A/B test".

comprehensive-review-full-review

31392
from sickn33/antigravity-awesome-skills

Use when working with comprehensive review full review

code-review-ai-ai-review

31392
from sickn33/antigravity-awesome-skills

You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-5, C

cc-skill-security-review

31355
from sickn33/antigravity-awesome-skills

This skill ensures all code follows security best practices and identifies potential vulnerabilities. Use when implementing authentication or authorization, handling user input or file uploads, or creating new API endpoints.

Post-Mortem & Incident Review Framework

3891
from openclaw/skills

Run structured post-mortems that actually prevent repeat failures. Blameless analysis, root cause identification, and action tracking.

DevOps & Infrastructure