Best use case
skills-eval is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Evaluate and improve Claude skill quality through auditing
Teams using skills-eval should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/nm-abstract-skills-eval/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How skills-eval Compares
| Feature / Agent | skills-eval | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Evaluate and improve Claude skill quality through auditing
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
SKILL.md Source
> **Night Market Skill** — ported from [claude-night-market/abstract](https://github.com/athola/claude-night-market/tree/master/plugins/abstract). For the full experience with agents, hooks, and commands, install the Claude Code plugin. # Skills Evaluation and Improvement ## Table of Contents 1. [Overview](#overview) 2. [Quick Start](#quick-start) 3. [Evaluation Workflow](#evaluation-workflow) 4. [Evaluation and Optimization](#evaluation-and-optimization) 5. [Resources](#resources) ## Overview This framework audits Claude skills against quality standards to improve performance and reduce token consumption. Automated tools analyze skill structure, measure context usage, and identify specific technical improvements. Run verification commands after each audit to confirm fixes work correctly. The `skills-auditor` provides structural analysis, while the `improvement-suggester` ranks fixes by impact. Compliance is verified through the `compliance-checker`. Runtime efficiency is monitored by `tool-performance-analyzer` and `token-usage-tracker`. ## Quick Start ### Basic Audit Run a full audit of all skills or target a specific file to identify structural issues. ```bash # Audit all skills make audit-all # Audit specific skill make audit-skill TARGET=path/to/skill/SKILL.md ``` ### Analysis and Optimization Use `skill_analyzer.py` for complexity checks and `token_estimator.py` to verify the context budget. ```bash make analyze-skill TARGET=path/to/skill/SKILL.md make estimate-tokens TARGET=path/to/skill/SKILL.md ``` ### Improvements Generate a prioritized plan and verify standards compliance using `improvement_suggester.py` and `compliance_checker.py`. ```bash make improve-skill TARGET=path/to/skill/SKILL.md make check-compliance TARGET=path/to/skill/SKILL.md ``` ## Evaluation Workflow Start with `make audit-all` to inventory skills and identify high-priority targets. For each skill requiring attention, run analysis with `analyze-skill` to map complexity. Generate an improvement plan, apply fixes, and run `check-compliance` to verify the skill meets project standards. Finalize by checking the token budget for efficiency. ## Evaluation and Optimization Quality assessments use the `skills-auditor` and `improvement-suggester` to generate detailed reports. Performance analysis focuses on token efficiency through the `token-usage-tracker` and tool performance via `tool-performance-analyzer`. For standards compliance, the `compliance-checker` automates common fixes for structural issues. ### Scoring and Prioritization We evaluate skills across five dimensions: structure compliance, content quality, token efficiency, activation reliability, and tool integration. Scores above 90 represent production-ready skills, while scores below 50 indicate critical issues requiring immediate attention. Improvements are prioritized by impact. Critical issues include security vulnerabilities or broken functionality. High-priority items cover structural flaws that hinder discoverability. Medium and low priorities focus on best practices and minor optimizations. ### Structural Patterns **Deprecated**: `skills/shared/modules/` directories. Shared modules must be relocated into the consuming skill's own `modules/` directory. The evaluator flags any remaining `skills/shared/` as a structural warning. **Current**: Each skill owns its modules at `skills/<skill-name>/modules/`. Cross-skill references use relative paths (e.g., `../skill-authoring/modules/anti-rationalization.md`). ## Resources ### Shared Modules: Cross-Skill Patterns - **Anti-Rationalization Patterns**: See [anti-rationalization.md](../skill-authoring/modules/anti-rationalization.md) - **Enforcement Language**: See [enforcement-language.md](../shared-patterns/modules/workflow-patterns.md) - **Trigger Patterns**: See [trigger-patterns.md](modules/evaluation-criteria.md) ### Skill-Specific Modules - **Trigger Isolation Analysis**: See `modules/trigger-isolation-analysis.md` - **Skill Authoring Best Practices**: See `modules/skill-authoring-best-practices.md` - **Authoring Checklist**: See `modules/authoring-checklist.md` - **Evaluation Workflows**: See `modules/evaluation-workflows.md` - **Quality Metrics**: See `modules/quality-metrics.md` - **Advanced Tool Use Analysis**: See `modules/advanced-tool-use-analysis.md` - **Evaluation Framework**: See `modules/evaluation-framework.md` - **Integration Patterns**: See `modules/integration.md` - **Troubleshooting**: See `modules/troubleshooting.md` - **Pressure Testing**: See `modules/pressure-testing.md` - **Integration Testing**: See `modules/integration-testing.md` - **Multi-Metric Evaluation**: See `modules/multi-metric-evaluation-methodology.md` - **Performance Benchmarking**: See `modules/performance-benchmarking.md` ### Tools and Automation - **Tools**: Executable analysis utilities in `scripts/` directory. - **Automation**: Setup and validation scripts in `scripts/automation/`.
Related Skills
find-skills
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
ht-skills
管理灏天文库文集和文档,支持新建文集、新建文档、查询文集/文档、更新文档、修改文档归属、管理文档层级。适用于 OpenClaw 自主写文章并上传、文集创建、文档入库、文档移动等场景。
web-skills-protocol
Auto-discover and use Web Skills Protocol (WSP) skills when interacting with websites. Use this skill whenever the user asks you to interact with, use, or perform actions on a website or web service — such as searching a site, placing an order, deploying an app, or calling a web API. Before scraping HTML or guessing at interfaces, check if the site publishes a skills.txt or agents.txt file that teaches you how to use it properly. If a website has complex elements (e.g., heavy JavaScript, interactive UIs), activating this skill can also help you understand the site's purpose and capabilities. Do NOT use for local file operations or non-web tasks.
clawdtm-skills
Review and rate Claude Code skills. See what humans and AI agents recommend.
micropython-skills/sensor
MicroPython sensor reading — DHT11/22, BME280, MPU6050, ADC, ultrasonic HC-SR04, photoresistor, generic I2C sensors.
micropython-skills/network
MicroPython networking — WiFi STA/AP, HTTP requests, MQTT pub/sub, BLE, NTP time sync, WebSocket.
micropython-skills/diagnostic
MicroPython device diagnostics — system info, I2C/SPI bus scan, pin state, filesystem, memory, performance benchmarks.
micropython-skills/algorithm
MicroPython on-device algorithms — PID controller, moving average, Kalman filter, state machine, task scheduler, data logger.
micropython-skills/actuator
MicroPython actuator control — GPIO output, PWM (LED/servo/motor), stepper motor, WS2812 NeoPixel, buzzer.
micropython-skills
Program and interact with embedded development boards (ESP32, ESP32-S3, ESP32-C3, ESP8266, NodeMCU, Raspberry Pi Pico, RP2040, STM32) through real-time REPL. This skill turns microcontroller hardware into an AI-programmable co-processor — read sensors, control actuators, flash firmware, diagnose devices, and deploy algorithms. Trigger when the user mentions any dev board or hardware interaction: ESP32, ESP8266, NodeMCU, Pico, 开发板, 板子, 单片机, 嵌入式, microcontroller, development board, sensor reading, GPIO, LED, motor, relay, I2C, SPI, UART, ADC, PWM, servo, DHT, BME280, temperature sensor, 传感器, 读传感器, 控制电机, 继电器, flash firmware, 烧录, 刷固件, 刷机, mpremote, MicroPython, IoT, MQTT, WiFi on board, 设备没反应, device not responding, or any task involving programming or controlling a physical microcontroller board.
ml-model-eval-benchmark
Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.
rules-eval
Evaluate and validate Claude Code rules in .claude/rules/ directories. Use for frontmatter, glob patterns, and quality audits