rules-eval
Evaluate and validate Claude Code rules in .claude/rules/ directories. Use for frontmatter, glob patterns, and quality audits
Best use case
rules-eval is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Evaluate and validate Claude Code rules in .claude/rules/ directories. Use for frontmatter, glob patterns, and quality audits
Teams using rules-eval should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/nm-abstract-rules-eval/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How rules-eval Compares
| Feature / Agent | rules-eval | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Evaluate and validate Claude Code rules in .claude/rules/ directories. Use for frontmatter, glob patterns, and quality audits
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
SKILL.md Source
> **Night Market Skill** — ported from [claude-night-market/abstract](https://github.com/athola/claude-night-market/tree/master/plugins/abstract). For the full experience with agents, hooks, and commands, install the Claude Code plugin. # Rules Evaluation Framework ## Table of Contents 1. [Overview](#overview) 2. [Quick Start](#quick-start) 3. [Evaluation Workflow](#evaluation-workflow) 4. [Scoring](#scoring) 5. [Resources](#resources) ## Overview This skill evaluates Claude Code rules in `.claude/rules/` directories against quality standards. It validates YAML frontmatter, glob pattern syntax, content quality, and directory organization. Rules files support path-scoped conditional loading via `paths` frontmatter and unconditional rules (no `paths` field). Key validations: YAML syntax errors, unquoted glob patterns, Cursor-specific fields (`alwaysApply`, `globs`), overly broad patterns, content verbosity, and naming conventions. ## Quick Start ```bash # Evaluate rules in current project /rules-eval # Evaluate specific directory /rules-eval .claude/rules/ # Detailed analysis with recommendations /rules-eval --detailed ``` ## Evaluation Workflow 1. Scan `.claude/rules/` for all `.md` files (including subdirectories) 2. Validate YAML frontmatter syntax and fields 3. Analyze glob patterns for correctness and specificity 4. Assess content quality (actionable, concise, non-conflicting) 5. Check organization (naming, structure, symlinks) 6. Measure token efficiency and redundancy ## Scoring | Category | Points | Focus | |----------|--------|-------| | Frontmatter Validity | 25 | YAML syntax, required fields, correct field names | | Glob Pattern Quality | 20 | Syntax, specificity, quoting | | Content Quality | 25 | Actionable, concise, non-conflicting | | Organization | 15 | Naming, structure, symlink usage | | Token Efficiency | 15 | Rule size, redundancy detection | | Score | Level | |-------|-------| | 91-100 | Excellent - Production-ready | | 76-90 | Good - Minor improvements possible | | 51-75 | Basic - Needs optimization | | 26-50 | Below Standards - Significant issues | | 0-25 | Critical - Invalid or broken rules | ## Resources ### Skill-Specific Modules - **Frontmatter Validation**: See `modules/frontmatter-validation.md` - **Glob Pattern Analysis**: See `modules/glob-pattern-analysis.md` - **Content Quality Metrics**: See `modules/content-quality-metrics.md` - **Organization Patterns**: See `modules/organization-patterns.md` ### Tools - **Rules Validator**: `scripts/rules_validator.py` ### Related Skills - `abstract:skills-eval` - Skill evaluation framework - `abstract:hooks-eval` - Hook evaluation framework
Related Skills
ml-model-eval-benchmark
Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.
skills-eval
Evaluate and improve Claude skill quality through auditing
hooks-eval
Evaluate hook security, performance, and SDK compliance. Use for audits
openclaw-cc-rules
OpenClaw 编程工作流 Skill — Plan Mode + 任务追踪 + Git 安全协议 + 只读探索
rules-of-the-claw
A strong, field-tested Guardian baseline for OpenClaw Guardian — 56 deterministic rules protecting against credential theft, data exfiltration, network scanning, and infrastructure destruction. No LLM voting overhead. Pure regex enforcement at the tool layer.
project-evaluator
描述一个项目想法,AI 从市场/技术/商业/风险四个维度系统评估, 输出评估报告、竞品速查、MVP建议,帮你决策「值不值得做」。
tech-stack-evaluator
Technology stack evaluation and comparison with TCO analysis, security assessment, and ecosystem health scoring. Use when comparing frameworks, evaluating technology stacks, calculating total cost of ownership, assessing migration paths, or analyzing ecosystem viability.
llm-evaluator
LLM-as-a-Judge evaluation system using Langfuse. Score AI outputs on relevance, accuracy, hallucination, and helpfulness. Backfill scoring on historical traces. Uses GPT-5-nano for cost-efficient judging. Use when evaluating AI quality, building evals, or monitoring output accuracy.
agent-architecture-evaluator
Use when evaluating, testing, and optimizing an agent architecture or multi-agent system. Best for reviewing planning, routing, memory, tool use, reliability, observability, cost, and system-level failure modes.
ios-rules
38 battle-tested iOS development rules covering accessibility, navigation, architecture, dark mode, localization, App Review guidelines, and more. Targets the mistakes LLMs actually make when generating Swift/SwiftUI code.
interview-evaluation-report
面试评估报告。触发场景:用户提供面试记录或面试笔记,要求生成结构化评估报告。
Vendor Evaluation & Due Diligence
Structured framework for evaluating software vendors, service providers, and technology partners before signing contracts.