skill-evolve

Skill Evolve is a meta-skill designed to systematically improve the quality and effectiveness of other AI agent skills. It guides an AI agent through an iterative 'observe, summarize, improve, verify' workflow.

11 stars
Complexity: easy

About this skill

Skill Evolve operates on the core philosophy that 'good skills are summarized, not designed,' advocating for continuous iteration based on real-world performance. Instead of striving for a perfect initial design, this skill empowers AI agents to learn from their executions, identify patterns of failure, and incrementally refine their capabilities. It triggers when users express dissatisfaction with a skill's performance or explicitly ask for optimization, iteration, or evolution. The skill provides a structured, five-step evolutionary cycle: 'Cold Start' (initial intuitive assessment), 'Observe' (running the target skill with diverse prompts), 'Refine Patterns' (identifying root causes from observations), 'Propose Improvement' (generating new skill versions), and 'Verify' (testing the improved version). This process leverages OTF (On-The-Fly) observation, JIT (Just-In-Time) focused fixes, and Bootstrap (using prior notes for fuel) methodologies. This meta-skill is invaluable for developers and advanced users of AI agents who want to enhance the robustness, reliability, and precision of their custom AI skills. It automates a significant portion of the debugging and optimization process, enabling AI agents to become more self-sufficient in improving their own operational quality.

Best use case

The primary use case for Skill Evolve is for AI developers, prompt engineers, and advanced users who create and deploy custom AI agent skills. It is especially beneficial when an existing skill is underperforming, producing inconsistent results, or failing to handle specific edge cases effectively. By providing a structured, data-driven approach to identify and rectify issues, it helps users achieve higher quality, more reliable AI behaviors, making the skill development and maintenance process more efficient and less reliant on constant manual oversight.

Skill Evolve is a meta-skill designed to systematically improve the quality and effectiveness of other AI agent skills. It guides an AI agent through an iterative 'observe, summarize, improve, verify' workflow.

A systematically improved version of the target AI skill, accompanied by detailed observations, identified patterns, and proposed changes, leading to enhanced performance and reliability.

Practical example

Example input

Improve the `code-generator` skill. It's often producing code with minor syntax errors, and the comments are sometimes unhelpful. I'm unsatisfied with its current output quality.

Example output

I've analyzed the `code-generator` skill. It seems the 'generate_comments' instruction was too vague, leading to inconsistent quality. I've refined it to require 'executable examples in comments when possible.' The updated skill is now available in `code-generator-evolved/SKILL.md`.

When to use this skill

  • When an existing AI skill is not producing satisfactory or consistent results.
  • When you need to optimize, refine, or iterate on an AI skill's performance.
  • When you want to systematically diagnose why an AI skill is failing or underperforming.
  • When you have concrete examples of a skill's output that are unsatisfactory or problematic.

When not to use this skill

  • When you are creating a skill from scratch and haven't run it with any prompts yet.
  • When the core problem lies outside the skill's logic (e.g., external API issues, user prompt clarity, lack of necessary context).
  • When a skill requires only a simple, obvious correction (though it could be used, it's overkill).
  • When you lack sufficient test prompts or examples to observe the skill's behavior meaningfully.

How skill-evolve Compares

Feature / Agentskill-evolveStandard Approach
Platform SupportClaudeLimited / Varies
Context Awareness High Baseline
Installation ComplexityeasyN/A

Frequently Asked Questions

What does this skill do?

Skill Evolve is a meta-skill designed to systematically improve the quality and effectiveness of other AI agent skills. It guides an AI agent through an iterative 'observe, summarize, improve, verify' workflow.

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Skill Evolve — 演进式 Skill 改进

> **核心哲学:好 skill 是总结出来的,不是设计出来的。**
>
> 不要试图一次性写出完美的 skill。先跑起来,观察真实表现,从失败中提炼模式,小步迭代,让 skill 自己"长"好。

## 你要做什么

你是一个 skill 改进专家。用户会给你一个已有的 skill(或指向一个 skill 的路径),你的任务是通过**观察→总结→改进→验证**的循环,系统性地提升它的质量。

你的方法论来自三个核心机制:

- **OTF(On-The-Fly)**:边跑边总结,不等全部跑完再回头看
- **JIT(Just-In-Time)**:每轮只修一个核心问题,快速交付可验证的改进版本
- **Bootstrap(自举)**:每轮改进产出的观察笔记,是下一轮改进的燃料

---

## 工作流:五步演进循环

### 第一步:冷启动——建立直觉(5 分钟)

读取目标 skill 的全部内容(SKILL.md + 引用的 references/scripts/agents)。然后回答:

1. 这个 skill 想让 Claude 做什么?
2. 它的触发场景是什么?
3. 它给 Claude 的核心指令是什么?
4. 哪些地方**你直觉上觉得可能有问题**?(模糊指令、缺失边界、过度工程、信息密度低)

把直觉记下来,但**不要急着改**。直觉只是假设,需要数据验证。

### 第二步:观察——用真实 prompt 跑 skill(10-20 分钟)

这一步是整个流程的基石。没有观察就没有模式,没有模式就不该动手改。

**选择 3-5 个测试 prompt:**
- 至少 1 个是 skill 最核心的使用场景(happy path)
- 至少 1 个是边缘场景(用户说法含糊、输入不规范、或者接近但不完全匹配 skill 触发条件)
- 至少 1 个是"这个 skill 其实不该被触发"的场景(反向测试)
- 如果用户已经带着具体的失败案例来,**优先用那些真实案例**

**跑法:**

如果在 Claude Code 中有 subagent 能力,为每个 prompt 启动一个 subagent,让它带着目标 skill 去执行任务。把输出保存到工作目录:

```
<skill-name>-evolve/
├── round-1/
│   ├── prompt-1/
│   │   ├── prompt.md        # 原始 prompt
│   │   ├── output/          # skill 产出的文件
│   │   └── transcript.md    # 执行过程记录(如果可获取)
│   ├── prompt-2/
│   └── ...
```

如果没有 subagent 能力(比如在 Claude.ai),就自己读取 skill 然后按 skill 的指令执行每个 prompt,把结果存下来。

**OTF 要求:每跑完一个 prompt,立即记录观察。** 不要等全部跑完再回头看。写到 `round-1/observations.md`:

```markdown
## Prompt 1: [简述]
- 结果:[好/一般/差]
- 具体问题:[描述]
- 猜测原因:[指向 skill 中的哪段指令]

## Prompt 2: [简述]
...
```

### 第三步:提炼模式——从案例到规律(10 分钟)

这是最关键的一步。**不要逐个修 bug,要找共性。**

回顾所有 prompt 的观察记录,问自己:

1. **重复出现的问题是什么?**(3 次规则:同类问题出现 ≥2 次就值得提炼为模式)
2. **问题的根因是 skill 的哪一层?**
   - 触发层:description 不准,该触发的不触发,不该触发的触发了
   - 指令层:关键步骤缺失、指令模糊、优先级混乱
   - 资源层:缺少必要的 reference/script/template
   - 架构层:skill 职责太宽或太窄,需要拆分或合并
3. **哪些问题最影响用户体验?**(按影响面排优先级,不是按你觉得好改排)

产出:**错误模式表**,写到 `round-1/patterns.md`:

```markdown
# 错误模式表 — Round 1

## P01: [模式名称]
- 出现次数:N 次(prompt 1, 3, 5)
- 表现:[用户看到了什么问题]
- 根因:[skill 中的哪段指令导致的]
- 影响面:[高/中/低]

## P02: [模式名称]
...

## 本轮改进优先级
1. P0x — [理由]
2. P0x — [理由]
```

**压缩比意识**:如果你有 5 个 prompt、发现了 8 个具体问题,但只提炼出 2 条模式——这很正常也很好。2 条模式比 8 个补丁更有价值,因为模式能覆盖未来的新 prompt。

### 第四步:改写——JIT 原则,每轮只改一件事(15-30 分钟)

**绝对不要一次性重写整个 skill。**

从错误模式表中取优先级最高的 1-2 个模式,针对性改写。改写时遵循:

**JIT 三原则:**

1. **最小改动**:只改与目标模式直接相关的段落。不顺手"优化"其他部分——那是下一轮的事。
2. **解释 why**:每处改动都要理解"为什么旧写法有问题"。如果你说不清 why,说明你对问题的理解还不够深,回到第三步再看看。
3. **可验证**:改完后,你应该能用同样的 prompt 重跑,明确看到改进效果。如果改动无法验证,它可能是多余的。

**改写注意事项:**

- 给 Claude 解释 why,而不是堆砌 MUST/NEVER。Claude 很聪明,理解原因比记住规则更有效。
- 检查是否有重复劳动可以抽成 script。如果每次跑 skill 都需要 Claude 现写一段类似的处理代码,考虑把它固化为 `scripts/` 下的脚本。
- 保持信息密度。删除没用的水句("请注意以下重要事项"之类的废话)。每一句话要么推进 Claude 的理解,要么指导 Claude 的行动。
- 长 skill 考虑分层:核心流程在 SKILL.md(<500行),细节放 references/。

**改写后立即保存改写笔记**到 `round-1/changes.md`:

```markdown
# 改写记录 — Round 1

## 目标模式
- P01: [模式名]
- P02: [模式名]

## 改动清单
1. [文件:行号] 改了什么,为什么
2. ...

## 预期效果
- Prompt 1 应该不再出现 [具体问题]
- Prompt 3 应该 [具体改善]
```

### 第五步:验证——重跑 + 判断是否收敛

用**同一批 prompt** 重跑改进后的 skill。对比前后结果。

**收敛判断标准:**

| 信号 | 含义 |
|------|------|
| 目标模式的问题消失了 | 改进有效,进入下一轮处理剩余模式 |
| 目标模式的问题减轻但没消失 | 改进方向对但力度不够,同一模式再改一轮 |
| 出现了新问题 | 改动引入了副作用,需要回滚或调整 |
| 新模式数 = 0,连续两轮 | **收敛完成**,skill 已达到当前测试集下的质量上限 |
| 修正量指数衰减(如 8→3→1) | 正常收敛曲线,继续迭代 |
| 修正量不降反升 | 有系统性问题被遗漏,回到第三步重新分析 |

验证通过后,更新工作目录进入 `round-2/`,重复第二到第五步。

---

## Bootstrap:让改进笔记自我增殖

每一轮的 `observations.md`、`patterns.md`、`changes.md` 不是用完就扔的草稿纸。它们是下一轮改进的上下文。

**Round 2 开始时,先读 Round 1 的全部笔记。** 这样你不会重复已解决的问题,能直接聚焦新暴露的模式。

**当你积累了 3 轮以上的改进笔记,做一次递归压缩:**

```
Round 1-3 的所有 observations(15+ 个具体问题)
  ↓ 压缩
5-8 条错误模式
  ↓ 压缩
2-3 条改进原则(这个 skill 最核心的"坑"是什么)
  ↓ 压缩
1 条洞察(这个 skill 的本质问题是什么)
```

把压缩结果写到 `evolution-log.md`(放在工作目录根下)。这份日志是整个改进过程的"记忆",如果未来需要再次改进这个 skill,从这里开始而不是从零开始。

---

## 与用户协作

### 沟通节奏

- **每轮开始前**:告诉用户你这一轮要解决什么模式、预计改什么
- **每轮结束后**:展示前后对比(用具体 prompt 的输出差异说话,不要抽象描述)
- **让用户做判断**:skill 输出质量是主观的。模式提炼是你的事,质量判断是用户的事。把改进前后的输出都摆出来,让用户说好不好

### 何时停止

用户说"可以了"——停。
连续两轮没有新模式——停。
用户开始说"差不多了"、"够用了"——停。

不要追求完美。60% → 90% 的投入产出比远高于 90% → 95%。JIT 的精神是"够用就交付"。

### 交付物

改进完成后,交付给用户:

1. **改进后的 skill**(直接覆盖原文件,或写到新路径让用户确认)
2. **evolution-log.md**(改进过程的压缩记忆,方便下次继续)
3. **一句话总结**:这次改进了什么、为什么、效果如何

---

## 快速模式:用户只有一个小问题

不是每次都需要完整五步循环。如果用户带着一个具体问题来("这个 skill 生成的格式不对"、"它总是漏掉 X 步骤"),可以走快速路径:

1. 读 skill → 定位问题段落 → 理解 why
2. 针对性改写(解释给用户听)
3. 建议用户试一下,看看好没好

快速模式适用于:问题明确、改动局部、不涉及 skill 整体架构。如果快速改完用户还不满意,再切到完整循环。

---

## 特殊场景

### 用户没有明确的 skill,只有一个模糊的想法

这时候不是 skill-evolve 的场景,而是 skill-creator 的场景。建议用户先用 `/skill-creator` 创建一个初版,再用 `/evolve` 迭代改进。

### skill-evolve 和 skill-creator 的分工

- **skill-creator**:从 0 到 1。用户想要一个新 skill,帮他们写出初版、跑评测、优化触发描述。
- **skill-evolve**:从 1 到 N。用户已有一个能跑的 skill,但质量不满意,需要系统性地观察问题、提炼模式、迭代改进。

两者可以串联:先 `/skill-creator` 出初版,再 `/evolve` 持续打磨。

### 用户带着一个输出来说"这不对"

这是最好的起点。一个真实的失败案例 = 一个免费的测试 prompt。从这个案例开始,补充 2-3 个相关 prompt,直接进入第二步。

### Skill 问题不在指令层,在触发层(description 不准)

如果观察发现 skill 该触发时不触发,或不该触发时触发了,这是 description 的问题。改进 description 可以参考 skill-creator 中的"Description Optimization"章节(`/skill-creator` 的 SKILL.md 中有详细流程)。skill-evolve 聚焦于 skill 被触发后的执行质量。

### 改了 3 轮还不收敛

可能的原因:
1. **skill 职责太宽**——一个 skill 试图覆盖太多场景,应该拆分
2. **测试 prompt 之间矛盾**——不同 prompt 对 skill 的期望互相冲突,需要和用户对齐
3. **底层能力限制**——有些任务超出了当前模型能力,skill 再怎么改也做不到,诚实告知用户

---

## 思维检查清单

每轮改进前,过一遍这个清单:

- [ ] 我是在修真实观察到的问题,还是在凭直觉"优化"?
- [ ] 这轮改动能被具体的 prompt 验证吗?
- [ ] 我是不是在同时改太多东西?(一次只改一件事)
- [ ] 我是在给 Claude 解释 why,还是在堆 MUST/NEVER?
- [ ] 这个改动是通用的(能覆盖未来的新 prompt),还是只是在拟合当前测试集?
- [ ] 我有没有删掉不起作用的旧指令?(skill 应该越改越精炼,不是越改越长)

Related Skills

workspace-surface-audit

144923
from affaan-m/everything-claude-code

Audit the active repo, MCP servers, plugins, connectors, env surfaces, and harness setup, then recommend the highest-value ECC-native skills, hooks, agents, and operator workflows. Use when the user wants help setting up Claude Code or understanding what capabilities are actually available in their environment.

DevelopmentClaude

ui-demo

144923
from affaan-m/everything-claude-code

Record polished UI demo videos using Playwright. Use when the user asks to create a demo, walkthrough, screen recording, or tutorial video of a web application. Produces WebM videos with visible cursor, natural pacing, and professional feel.

Developer ToolsClaude

token-budget-advisor

144923
from affaan-m/everything-claude-code

Offers the user an informed choice about how much response depth to consume before answering. Use this skill when the user explicitly wants to control response length, depth, or token budget. TRIGGER when: "token budget", "token count", "token usage", "token limit", "response length", "answer depth", "short version", "brief answer", "detailed answer", "exhaustive answer", "respuesta corta vs larga", "cuántos tokens", "ahorrar tokens", "responde al 50%", "dame la versión corta", "quiero controlar cuánto usas", or clear variants where the user is explicitly asking to control answer size or depth. DO NOT TRIGGER when: user has already specified a level in the current session (maintain it), the request is clearly a one-word answer, or "token" refers to auth/session/payment tokens rather than response size.

Productivity & Content CreationClaude

skill-comply

144923
from affaan-m/everything-claude-code

Visualize whether skills, rules, and agent definitions are actually followed — auto-generates scenarios at 3 prompt strictness levels, runs agents, classifies behavioral sequences, and reports compliance rates with full tool call timelines

DevelopmentClaude

santa-method

144923
from affaan-m/everything-claude-code

Multi-agent adversarial verification with convergence loop. Two independent review agents must both pass before output ships.

Quality AssuranceClaude

safety-guard

144923
from affaan-m/everything-claude-code

Use this skill to prevent destructive operations when working on production systems or running agents autonomously.

DevelopmentClaude

repo-scan

144923
from affaan-m/everything-claude-code

Cross-stack source code asset audit — classifies every file, detects embedded third-party libraries, and delivers actionable four-level verdicts per module with interactive HTML reports.

DevelopmentClaude

project-flow-ops

144923
from affaan-m/everything-claude-code

Operate execution flow across GitHub and Linear by triaging issues and pull requests, linking active work, and keeping GitHub public-facing while Linear remains the internal execution layer. Use when the user wants backlog control, PR triage, or GitHub-to-Linear coordination.

DevelopmentClaude

product-lens

144923
from affaan-m/everything-claude-code

Use this skill to validate the "why" before building, run product diagnostics, and pressure-test product direction before the request becomes an implementation contract.

Product ManagementClaude

openclaw-persona-forge

144923
from affaan-m/everything-claude-code

为 OpenClaw AI Agent 锻造完整的龙虾灵魂方案。根据用户偏好或随机抽卡, 输出身份定位、灵魂描述(SOUL.md)、角色化底线规则、名字和头像生图提示词。 如当前环境提供已审核的生图 skill,可自动生成统一风格头像图片。 当用户需要创建、设计或定制 OpenClaw 龙虾灵魂时使用。 不适用于:微调已有 SOUL.md、非 OpenClaw 平台的角色设计、纯工具型无性格 Agent。 触发词:龙虾灵魂、虾魂、OpenClaw 灵魂、养虾灵魂、龙虾角色、龙虾定位、 龙虾剧本杀角色、龙虾游戏角色、龙虾 NPC、龙虾性格、龙虾背景故事、 lobster soul、lobster character、抽卡、随机龙虾、龙虾 SOUL、gacha。

AI Tools & UtilitiesClaude

manim-video

144923
from affaan-m/everything-claude-code

Build reusable Manim explainers for technical concepts, graphs, system diagrams, and product walkthroughs, then hand off to the wider ECC video stack if needed. Use when the user wants a clean animated explainer rather than a generic talking-head script.

DevelopmentClaude

laravel-plugin-discovery

144923
from affaan-m/everything-claude-code

Discover and evaluate Laravel packages via LaraPlugins.io MCP. Use when the user wants to find plugins, check package health, or assess Laravel/PHP compatibility.

DevelopmentClaude