rules-eval

Evaluate and validate Claude Code rules in .claude/rules/ directories. Use for frontmatter, glob patterns, and quality audits

3,891 stars

byopenclaw

View on GitHub Installation ↓

Best use case

rules-eval is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Evaluate and validate Claude Code rules in .claude/rules/ directories. Use for frontmatter, glob patterns, and quality audits

Teams using rules-eval should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/nm-abstract-rules-eval/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/athola/nm-abstract-rules-eval/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/nm-abstract-rules-eval/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How rules-eval Compares

Feature / Agent	rules-eval	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Evaluate and validate Claude Code rules in .claude/rules/ directories. Use for frontmatter, glob patterns, and quality audits

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

SKILL.md Source

> **Night Market Skill** — ported from [claude-night-market/abstract](https://github.com/athola/claude-night-market/tree/master/plugins/abstract). For the full experience with agents, hooks, and commands, install the Claude Code plugin.


# Rules Evaluation Framework

## Table of Contents

1. [Overview](#overview)
2. [Quick Start](#quick-start)
3. [Evaluation Workflow](#evaluation-workflow)
4. [Scoring](#scoring)
5. [Resources](#resources)

## Overview

This skill evaluates Claude Code rules in `.claude/rules/` directories against quality standards. It validates YAML frontmatter, glob pattern syntax, content quality, and directory organization. Rules files support path-scoped conditional loading via `paths` frontmatter and unconditional rules (no `paths` field).

Key validations: YAML syntax errors, unquoted glob patterns, Cursor-specific fields (`alwaysApply`, `globs`), overly broad patterns, content verbosity, and naming conventions.

## Quick Start

```bash
# Evaluate rules in current project
/rules-eval

# Evaluate specific directory
/rules-eval .claude/rules/

# Detailed analysis with recommendations
/rules-eval --detailed
```

## Evaluation Workflow

1. Scan `.claude/rules/` for all `.md` files (including subdirectories)
2. Validate YAML frontmatter syntax and fields
3. Analyze glob patterns for correctness and specificity
4. Assess content quality (actionable, concise, non-conflicting)
5. Check organization (naming, structure, symlinks)
6. Measure token efficiency and redundancy

## Scoring

| Category | Points | Focus |
|----------|--------|-------|
| Frontmatter Validity | 25 | YAML syntax, required fields, correct field names |
| Glob Pattern Quality | 20 | Syntax, specificity, quoting |
| Content Quality | 25 | Actionable, concise, non-conflicting |
| Organization | 15 | Naming, structure, symlink usage |
| Token Efficiency | 15 | Rule size, redundancy detection |

| Score | Level |
|-------|-------|
| 91-100 | Excellent - Production-ready |
| 76-90 | Good - Minor improvements possible |
| 51-75 | Basic - Needs optimization |
| 26-50 | Below Standards - Significant issues |
| 0-25 | Critical - Invalid or broken rules |

## Resources

### Skill-Specific Modules
- **Frontmatter Validation**: See `modules/frontmatter-validation.md`
- **Glob Pattern Analysis**: See `modules/glob-pattern-analysis.md`
- **Content Quality Metrics**: See `modules/content-quality-metrics.md`
- **Organization Patterns**: See `modules/organization-patterns.md`

### Tools
- **Rules Validator**: `scripts/rules_validator.py`

### Related Skills
- `abstract:skills-eval` - Skill evaluation framework
- `abstract:hooks-eval` - Hook evaluation framework

Related Skills

ml-model-eval-benchmark

3891

from openclaw/skills

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

Machine Learning

skills-eval

3891

from openclaw/skills

Evaluate and improve Claude skill quality through auditing

hooks-eval

3891

from openclaw/skills

Evaluate hook security, performance, and SDK compliance. Use for audits

openclaw-cc-rules

3891

from openclaw/skills

OpenClaw 编程工作流 Skill — Plan Mode + 任务追踪 + Git 安全协议 + 只读探索

rules-of-the-claw

3891

from openclaw/skills

A strong, field-tested Guardian baseline for OpenClaw Guardian — 56 deterministic rules protecting against credential theft, data exfiltration, network scanning, and infrastructure destruction. No LLM voting overhead. Pure regex enforcement at the tool layer.

project-evaluator

3891

from openclaw/skills

描述一个项目想法，AI 从市场/技术/商业/风险四个维度系统评估，输出评估报告、竞品速查、MVP建议，帮你决策「值不值得做」。

tech-stack-evaluator

3891

from openclaw/skills

Technology stack evaluation and comparison with TCO analysis, security assessment, and ecosystem health scoring. Use when comparing frameworks, evaluating technology stacks, calculating total cost of ownership, assessing migration paths, or analyzing ecosystem viability.

llm-evaluator

3891

from openclaw/skills

LLM-as-a-Judge evaluation system using Langfuse. Score AI outputs on relevance, accuracy, hallucination, and helpfulness. Backfill scoring on historical traces. Uses GPT-5-nano for cost-efficient judging. Use when evaluating AI quality, building evals, or monitoring output accuracy.

agent-architecture-evaluator

3891

from openclaw/skills

Use when evaluating, testing, and optimizing an agent architecture or multi-agent system. Best for reviewing planning, routing, memory, tool use, reliability, observability, cost, and system-level failure modes.

ios-rules

3891

from openclaw/skills

38 battle-tested iOS development rules covering accessibility, navigation, architecture, dark mode, localization, App Review guidelines, and more. Targets the mistakes LLMs actually make when generating Swift/SwiftUI code.

interview-evaluation-report

3891

from openclaw/skills

面试评估报告。触发场景：用户提供面试记录或面试笔记，要求生成结构化评估报告。

Vendor Evaluation & Due Diligence

3891

from openclaw/skills

Structured framework for evaluating software vendors, service providers, and technology partners before signing contracts.