agent-harness-construction

设计和优化AI代理的动作空间、工具定义和观察格式,以提高完成率。

144,923 stars
Complexity: easy

About this skill

This skill provides an AI agent with comprehensive guidelines and architectural patterns for constructing and refining its operational framework, known as an 'agent harness.' It focuses on four core models: action space quality, observation quality, recovery quality, and context budget quality. By leveraging this skill, an agent can systematically enhance its planning capabilities, tool invocation, error recovery mechanisms, and overall ability to converge successfully on complex tasks. It includes principles for designing stable tool names, narrow input schemas, deterministic outputs, and appropriate tool granularity (micro, medium, macro). Furthermore, it defines best practices for structuring tool observations (status, summary, next_actions, artifacts) and robust error recovery contracts (root cause, safe retry, stop conditions). The skill also covers efficient context budget management by advocating for minimal system prompts, offloading guidance to skills, file referencing, and compression at phase boundaries. Architectural patterns like ReAct, Function Calling, and a recommended Hybrid mode are also discussed for different task types.

Best use case

An AI agent would use this skill when tasked with self-improvement, designing a new AI agent, analyzing and refining existing agent architectures, or troubleshooting an agent's underperformance in complex multi-step tasks requiring planning and tool use.

设计和优化AI代理的动作空间、工具定义和观察格式,以提高完成率。

An AI agent that exhibits improved task completion rates, more robust and efficient tool utilization, enhanced error recovery mechanisms leading to fewer failures, better context management preventing token overflow, and well-structured, optimized agent designs.

Practical example

Example input

Analyze my current agent architecture and provide recommendations to improve its completion rate on multi-step coding tasks, focusing on action space, observation quality, and error recovery. My agent frequently gets stuck in loops when using the 'refactor_code' tool.

Example output

{"analysis": "Identified potential issues with the 'refactor_code' tool's action space and observation format, leading to recovery challenges and looping behavior.", "recommendations": [{"area": "Action Space Design", "details": "Ensure 'refactor_code' has a focused input schema. If it handles too many types of refactoring, consider splitting it into 'extract_function', 'rename_variable', etc., for better granularity. Use stable and explicit tool names.", "guideline_reference": "行动空间设计 - 保持输入模式优先且范围狭窄。"}, {"area": "Observation Design", "details": "Enhance the 'refactor_code' tool's output to include a clear `status` (success/error), a concise `summary` (e.g., 'Refactoring successful, 5 files modified'), and `next_actions` if applicable (e.g., 'Review changes', 'Run tests'). Crucially, include `artifacts` like file paths of modified files for better traceability.", "guideline_reference": "观察设计 - 每个工具响应都应包括..."}, {"area": "Error Recovery Contract", "details": "For any 'refactor_code' errors, ensure the observation includes a '根本原因提示' (root cause hint, e.g., 'Syntax error introduced'), '安全重试指令' (safe retry instruction, e.g., 'Revert changes and retry with smaller scope'), and '明确的停止条件' (explicit stop condition, e.g., 'Stop if 3 consecutive refactor attempts fail to resolve syntax'). This will prevent infinite loops.", "guideline_reference": "错误恢复契约 - 对于每个错误路径..."}, {"area": "Architectural Pattern", "details": "Given the exploratory nature of coding tasks, a '混合模式 (ReAct 规划 + 类型化工具执行)' (Hybrid mode: ReAct planning + typed tool execution) is recommended. This allows for flexible planning with structured tool calls, improving adaptability and recovery.", "guideline_reference": "架构模式指导 - 混合模式(推荐)"}], "next_steps_for_agent": "Integrate these design principles into the 'refactor_code' tool and conduct thorough benchmark testing to validate improvements."}

When to use this skill

  • When an agent needs to improve its reliability and success rate on complex, multi-step tasks.
  • When designing new tools or integrating external APIs into an agent's action space.
  • When an agent is encountering frequent errors or failures in tool execution or task completion.
  • When managing large context windows to avoid token limits or maintain performance and focus.

When not to use this skill

  • For simple, single-turn tasks that do not involve complex planning or tool utilization.
  • When the agent's environment or capabilities are fixed and cannot be modified or optimized.
  • When the primary goal is rapid prototyping without a focus on long-term robustness or architectural optimization.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/agent-harness-construction/SKILL.md --create-dirs "https://raw.githubusercontent.com/affaan-m/everything-claude-code/main/docs/zh-CN/skills/agent-harness-construction/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/agent-harness-construction/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How agent-harness-construction Compares

Feature / Agentagent-harness-constructionStandard Approach
Platform SupportClaudeLimited / Varies
Context Awareness High Baseline
Installation ComplexityeasyN/A

Frequently Asked Questions

What does this skill do?

设计和优化AI代理的动作空间、工具定义和观察格式,以提高完成率。

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# 智能体框架构建

当你在改进智能体的规划、调用工具、从错误中恢复以及收敛到完成状态的方式时,使用此技能。

## 核心模型

智能体输出质量受限于:

1. 行动空间质量
2. 观察质量
3. 恢复质量
4. 上下文预算质量

## 行动空间设计

1. 使用稳定、明确的工具名称。
2. 保持输入模式优先且范围狭窄。
3. 返回确定性的输出形状。
4. 除非无法隔离,否则避免使用全能型工具。

## 粒度规则

* 对高风险操作(部署、迁移、权限)使用微工具。
* 对常见的编辑/读取/搜索循环使用中等工具。
* 仅当往返开销是主要成本时使用宏工具。

## 观察设计

每个工具响应都应包括:

* `status`: success|warning|error
* `summary`: 一行结果
* `next_actions`: 可执行的后续步骤
* `artifacts`: 文件路径 / ID

## 错误恢复契约

对于每个错误路径,应包括:

* 根本原因提示
* 安全重试指令
* 明确的停止条件

## 上下文预算管理

1. 保持系统提示词最少且不变。
2. 将大量指导信息移至按需加载的技能中。
3. 优先引用文件,而不是内联长文档。
4. 在阶段边界处进行压缩,而不是任意的令牌阈值。

## 架构模式指导

* ReAct:最适合路径不确定的探索性任务。
* 函数调用:最适合结构化的确定性流程。
* 混合模式(推荐):ReAct 规划 + 类型化工具执行。

## 基准测试

跟踪:

* 完成率
* 每项任务的重试次数
* pass@1 和 pass@3
* 每个成功任务的成本

## 反模式

* 太多语义重叠的工具。
* 不透明的工具输出,没有恢复提示。
* 仅输出错误而没有后续步骤。
* 上下文过载,包含不相关的引用。

Related Skills

eval-harness

144923
from affaan-m/everything-claude-code

Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles

DevelopmentClaude

workspace-surface-audit

144923
from affaan-m/everything-claude-code

Audit the active repo, MCP servers, plugins, connectors, env surfaces, and harness setup, then recommend the highest-value ECC-native skills, hooks, agents, and operator workflows. Use when the user wants help setting up Claude Code or understanding what capabilities are actually available in their environment.

DevelopmentClaude

safety-guard

144923
from affaan-m/everything-claude-code

Use this skill to prevent destructive operations when working on production systems or running agents autonomously.

DevelopmentClaude

repo-scan

144923
from affaan-m/everything-claude-code

Cross-stack source code asset audit — classifies every file, detects embedded third-party libraries, and delivers actionable four-level verdicts per module with interactive HTML reports.

DevelopmentClaude

project-flow-ops

144923
from affaan-m/everything-claude-code

Operate execution flow across GitHub and Linear by triaging issues and pull requests, linking active work, and keeping GitHub public-facing while Linear remains the internal execution layer. Use when the user wants backlog control, PR triage, or GitHub-to-Linear coordination.

DevelopmentClaude

manim-video

144923
from affaan-m/everything-claude-code

Build reusable Manim explainers for technical concepts, graphs, system diagrams, and product walkthroughs, then hand off to the wider ECC video stack if needed. Use when the user wants a clean animated explainer rather than a generic talking-head script.

DevelopmentClaude

laravel-plugin-discovery

144923
from affaan-m/everything-claude-code

Discover and evaluate Laravel packages via LaraPlugins.io MCP. Use when the user wants to find plugins, check package health, or assess Laravel/PHP compatibility.

DevelopmentClaude

design-system

144923
from affaan-m/everything-claude-code

Use this skill to generate or audit design systems, check visual consistency, and review PRs that touch styling.

DevelopmentClaude

click-path-audit

144923
from affaan-m/everything-claude-code

Trace every user-facing button/touchpoint through its full state change sequence to find bugs where functions individually work but cancel each other out, produce wrong final state, or leave the UI in an inconsistent state. Use when: systematic debugging found no bugs but users report broken buttons, or after any major refactor touching shared state stores.

DevelopmentClaude

ck

144923
from affaan-m/everything-claude-code

Persistent per-project memory for Claude Code. Auto-loads project context on session start, tracks sessions with git activity, and writes to native memory. Commands run deterministic Node.js scripts — behavior is consistent across model versions.

DevelopmentClaude

canary-watch

144923
from affaan-m/everything-claude-code

Use this skill to monitor a deployed URL for regressions after deploys, merges, or dependency upgrades.

DevelopmentClaude

benchmark

144923
from affaan-m/everything-claude-code

Use this skill to measure performance baselines, detect regressions before/after PRs, and compare stack alternatives.

DevelopmentClaude