Build Script-Execution Skill
Create a skill that orchestrates the write-execute-analyze loop to autonomously process data. Learn to implement error recovery, iterate toward robust solutions, and test your skill across diverse input scenarios. This is where specification-driven development meets real problem-solving.
Best use case
Build Script-Execution Skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Create a skill that orchestrates the write-execute-analyze loop to autonomously process data. Learn to implement error recovery, iterate toward robust solutions, and test your skill across diverse input scenarios. This is where specification-driven development meets real problem-solving.
Teams using Build Script-Execution Skill should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/build-script-execution-skill/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How Build Script-Execution Skill Compares
| Feature / Agent | Build Script-Execution Skill | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Create a skill that orchestrates the write-execute-analyze loop to autonomously process data. Learn to implement error recovery, iterate toward robust solutions, and test your skill across diverse input scenarios. This is where specification-driven development meets real problem-solving.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Build Script-Execution Skill
You've learned the pattern (Lesson 5): write code from specification → execute it → analyze errors → iterate. Now you're going to build a skill that orchestrates this loop autonomously.
But here's what makes this different from following a tutorial: You'll specify what problem you're solving FIRST, then let AI help you build the skill while you validate each decision. You're not just learning a pattern—you're learning to think about error recovery, convergence criteria, and edge cases the way production systems demand.
## Step 1: Write Your Specification
Before touching any skill code, write a specification for the problem you're solving. You'll use a CSV data processing task because it's concrete and has natural edge cases.
Choose one of these:
- **CSV Analysis**: Analyze customer or sales data for patterns
- **CSV Transformation**: Clean and restructure messy CSV data
- **CSV Aggregation**: Group data by dimensions and calculate metrics
Or define your own data processing task.
### Your Specification
Write this to a file (skill-spec.md) or document:
```markdown
# CSV Analysis Skill Specification
## Intent
[What does this skill do? Be specific about the business problem it solves]
## Input
- data_file: [type and format, e.g., "CSV with columns: customer_id, purchase_date, amount"]
- parameters: [what configuration does skill accept?]
## Output
- format: [JSON, CSV, report?]
- required_fields: [exact fields that must be in output]
- validation_rules: [how to verify output is correct]
## Success Criteria
- All data processed without loss
- Output format exactly matches specification
- Edge cases handled gracefully (malformed rows, missing values, etc.)
- Execution completes within 30 seconds
## Edge Cases to Handle
- [Case 1: e.g., "Empty CSV file"]
- [Case 2: e.g., "Missing column header"]
- [Case 3: e.g., "Non-numeric values in amount field"]
```
**Key principle**: Your specification must be complete enough that AI can generate correct code without additional context. If your spec is vague, the generated code will be equally vague.
## Step 2: Design Your Skill's Persona and Questions
Before building, define how your skill thinks about this problem.
```yaml
# Skill Persona
persona: |
You are a data orchestrator: your role is to write Python scripts that
process data robustly. When you encounter errors, you read the error message
carefully, understand why the code failed, and generate corrected code.
You validate results against the specification. You handle edge cases explicitly
rather than hoping they don't occur.
questions:
- "What does the data structure look like? (columns, data types, edge cases)"
- "What transformation or analysis does the specification require?"
- "What output format must the code produce?"
- "What validation proves the output is correct?"
- "What edge cases are most likely to occur in real data?"
principles:
- "Validate data before processing: Check columns exist, types are correct"
- "Fail explicitly: Raise errors with clear messages rather than silently producing wrong results"
- "Test assumptions: Don't assume column names; inspect actual data first"
- "Document the transformation: Add comments explaining the logic"
```
## Step 3: Build the Skill Core with AI Collaboration
Now you're going to build this skill with AI. You'll test code, discover error patterns you hadn't anticipated, and learn what your actual data requires. As you iterate, the skill improves—not because you're following a formula, but because specification-driven feedback drives real improvements.
### Part A: Generate Initial Implementation
**Your prompt to AI:**
```
I'm building a skill that processes CSV data. Here's my specification:
[PASTE YOUR SPEC]
Generate a Python skill implementation that:
1. Reads the CSV file
2. Validates the data structure (check columns, types)
3. Performs the required transformation/analysis
4. Returns results in the specified format
5. Includes error handling for common CSV issues
The code should be production-quality (defensive, not assuming data format).
```
AI will generate code. Study it. Does it match your specification?
**Critical evaluation**:
- Does the code check for expected columns before using them?
- Does it handle missing/null values?
- Does it validate the output format matches your spec?
**Document what you notice**:
```
Things I observe:
- [Good pattern in the approach]
- [Assumption that might not hold]
- [Edge case not addressed yet]
```
Use these observations to guide your feedback to AI when iterating.
### Part B: Test with Real Data
Get or create sample CSV data that matches your specification's expected format.
**Run the generated code**:
```python
# Save AI-generated code to analysis.py
# Create test_data.csv with sample data
# Run it
python analysis.py test_data.csv
```
**What happens**?
- ✓ Success: Output matches specification → Great! Move to Part C
- ✗ Syntax Error: Code won't even parse
- ✗ Runtime Error: Code runs but crashes (KeyError, TypeError, etc.)
- ✗ Logic Error: Code runs, output is wrong or incomplete
### Part C: Recover from Errors
This is where error recovery becomes visible.
**If you got a syntax error:**
```
Show AI the error:
"Here's my code:
[show problematic section]
Error: [paste error message]
What's wrong and how do I fix it?"
```
AI explains and provides corrected code.
**If you got a runtime error:**
```
Show AI the error:
"The code crashed with:
[error message and traceback]
What does this error mean?
What assumption did the code make that's wrong?
How should I fix it to handle real data?"
```
**If output is wrong/incomplete:**
```
"My spec requires [required output].
My code produces [what it actually produces].
What's missing? How should the code be changed to match the spec?"
```
### Part D: Iterate Until Convergence
Keep improving until:
✓ Code runs without errors
✓ Output matches your specification exactly
✓ Edge cases are handled (test with malformed data)
✓ Execution completes within time limit
**Test with multiple scenarios:**
```python
# Test 1: Clean data (happy path)
python analysis.py clean_data.csv
# Test 2: Missing columns
python analysis.py missing_columns.csv
# Test 3: Non-numeric values where numeric expected
python analysis.py malformed_data.csv
# Test 4: Empty file
python analysis.py empty.csv
# Test 5: Large file (check performance)
python analysis.py large_data.csv
```
For each test, document:
- Did it run without error? (Yes/No)
- Does output match spec format? (Yes/No)
- Are edge cases handled gracefully? (Yes/No)
## Step 4: Build the Iteration Loop (The Skill Automating the Pattern)
Now that you've manually gone through the loop, you're going to build a skill that does this automatically.
**Your skill needs these components:**
```python
def build_analysis_skill():
"""
The full script-execution skill that orchestrates:
1. Generate code from spec
2. Execute the code
3. Check for errors
4. Generate fixes if needed
5. Iterate until convergence
"""
# Component 1: Code Generation
def generate_code(specification: str) -> str:
"""Generate Python code from specification using AI"""
# Prompt AI with: "Given this spec: [spec],
# write complete Python code that implements it"
# Return the generated code
pass
# Component 2: Code Execution
def execute_code(code: str, input_file: str, timeout: int = 30) -> tuple[bool, str, str]:
"""Execute code, return (success, output, error_message)"""
# Run code with subprocess
# Capture stdout, stderr
# Return results with timeout protection
pass
# Component 3: Error Analysis
def analyze_error(error_message: str, code: str) -> str:
"""Understand what went wrong"""
# Parse error type (SyntaxError, RuntimeError, etc.)
# Extract the problematic line
# Return clear analysis of the issue
pass
# Component 4: Fix Generation
def generate_fix(error_analysis: str, code: str, spec: str) -> str:
"""Generate corrected code"""
# Prompt AI: "This code failed with: [error]
# Here's the problem: [analysis]
# The spec is: [spec]
# Generate corrected code that fixes this"
pass
# Component 5: Convergence Check
def check_convergence(output: str, spec: dict) -> bool:
"""Does output satisfy the specification?"""
# Validate: all required fields present
# Validate: output format correct
# Validate: no error messages in output
# Return True if spec is satisfied
pass
# Component 6: Main Iteration Loop
def execute_skill(specification: str, input_file: str) -> str:
"""Main skill that orchestrates everything"""
max_iterations = 5
iteration = 0
code = None
while iteration < max_iterations:
iteration += 1
if iteration == 1:
# First iteration: generate from spec
code = generate_code(specification)
# Execute the code
success, output, error = execute_code(code, input_file)
if success and check_convergence(output, spec):
# ✓ Converged! Specification is satisfied
return output
if not success:
# ✗ Error occurred
analysis = analyze_error(error, code)
code = generate_fix(analysis, code, specification)
# Loop continues, retry with fixed code
elif not check_convergence(output, spec):
# ✗ Output doesn't match spec
fix_request = f"Output is incomplete: {output}.
Required by spec: {spec}. Generate code that adds missing parts."
code = generate_fix(fix_request, code, specification)
# Loop continues, retry with improved code
# If we get here, max iterations reached without converging
raise RuntimeError(f"Failed to converge after {max_iterations} iterations")
```
## Step 5: Implementation Guidance with AI
You're going to build this skill using AI, but testing and validating each component.
### Get AI Help Building the Iteration Loop
```
I'm building a Python skill that generates code, executes it, and iterates
until a specification is satisfied.
Here's my specification:
[PASTE YOUR SPEC]
Here's my first attempt at code generation and execution:
[PASTE YOUR MANUAL CODE FROM STEP 3-4]
Now I need to build an automated loop that:
1. Generates code once (given spec)
2. Executes code (capture output/errors, 30-second timeout)
3. If error: analyze error, prompt you to generate fixed code
4. If output doesn't match spec: prompt you to improve code
5. Check convergence (spec is satisfied) → Stop
6. Repeat until convergence or 5 iterations max
Show me how to structure this as a Python class/functions.
Include error handling, timeout protection, and convergence checking.
```
### Build Convergence Validation
This is critical. Your skill must STOP when the specification is satisfied.
```python
def convergence_check(output: str, specification: dict) -> dict:
"""
Validate whether output satisfies specification.
Returns: {
'converged': bool,
'missing': [list of unsatisfied requirements],
'issues': [any problems found]
}
"""
results = {
'converged': True,
'missing': [],
'issues': []
}
# Check all required fields are present
for field in specification.get('output', {}).get('required_fields', []):
if field not in output:
results['missing'].append(f"Field missing: {field}")
results['converged'] = False
# Check output format (if JSON specified)
if specification.get('output', {}).get('format') == 'JSON':
try:
json.loads(output)
except:
results['issues'].append("Output is not valid JSON")
results['converged'] = False
# Add domain-specific validation based on your spec
# Example: if analyzing customers, verify segments exist
if 'required_segments' in specification:
for segment in specification['required_segments']:
if segment not in output:
results['missing'].append(f"Segment missing: {segment}")
results['converged'] = False
return results
```
### Add Timeout and Resource Protection
```python
import subprocess
import signal
def execute_code_safely(code: str, input_file: str, timeout: int = 30) -> tuple[bool, str, str]:
"""
Execute Python code with timeout and error capture.
Returns: (success: bool, output: str, error: str)
"""
# Write code to temporary file
with open('_temp_analysis.py', 'w') as f:
f.write(code)
try:
# Run with timeout
result = subprocess.run(
['python', '_temp_analysis.py', input_file],
capture_output=True,
text=True,
timeout=timeout
)
if result.returncode == 0:
# Success
return (True, result.stdout, '')
else:
# Execution failed
return (False, result.stdout, result.stderr)
except subprocess.TimeoutExpired:
return (False, '', 'TimeoutError: Execution exceeded 30 seconds')
except Exception as e:
return (False, '', f'ExecutionError: {str(e)}')
```
## Step 6: Test Your Skill Against Edge Cases
Your skill should handle:
### Test 1: Clean Data (Happy Path)
```python
skill = ScriptExecutionSkill(
specification=your_spec,
input_file='clean_data.csv'
)
result = skill.execute()
assert result is not None
assert 'error' not in result.lower()
```
**Expected**: Succeeds on first iteration
### Test 2: Malformed Data (Edge Case)
```python
# CSV with missing columns, non-numeric values, etc.
result = skill.execute(input_file='malformed_data.csv')
# Skill should detect error, fix code, retry
assert 'error' not in result.lower() # After recovery, still valid
```
**Expected**: Skill generates fix after detecting error
### Test 3: Empty File (Non-Recoverable)
```python
result = skill.execute(input_file='empty.csv')
# This SHOULD fail (non-recoverable)
assert result is None or 'error' in result.lower()
```
**Expected**: Skill recognizes this is non-recoverable, stops gracefully
### Test 4: Timeout Scenario
```python
# Spec with large data processing that might timeout
result = skill.execute(input_file='large_data.csv', timeout=5)
# Skill should timeout gracefully, not hang
assert 'timeout' in result.lower() or result is None
```
**Expected**: Skill times out, reports clearly
## Step 7: Document Your Skill
Real skills are documented for others to use.
```markdown
# CSV Analysis Skill
## Purpose
[What problem does this solve?]
## Usage
```python
from my_skill import ScriptExecutionSkill
skill = ScriptExecutionSkill(
specification={
'input': ['customers.csv'],
'output': {'format': 'JSON', 'required_fields': [...]},
'success_criteria': [...]
},
input_file='customers.csv'
)
result = skill.execute()
print(result)
```
## How It Works
1. Specification defines what code must do
2. Skill generates Python code from spec
3. Code executes against input file
4. Errors trigger automatic fix generation
5. Iteration continues until spec is satisfied or max retries reached
## Success Metrics
- Execution time: < 30 seconds
- Convergence rate: 95%+ (passes with clean data)
- Edge case handling: Gracefully recovers or fails clearly
## Known Limitations
- [What doesn't it handle?]
- [When should you use something else?]
```
---
## Try With AI
Now you'll refine your skill with AI collaboration, focused on error recovery and robustness.
### Prompt 1: Design Error Recovery Patterns
```
I've built a skill that generates Python code from specifications and
executes it. It encounters three types of errors:
1. Syntax errors (code won't parse)
2. Runtime errors (code crashes during execution)
3. Logic errors (code runs but output is wrong)
For each error type, help me design the recovery strategy:
**Syntax errors**:
- How should I prompt you to generate fixed code?
- What context should I provide?
**Runtime errors**:
- How should I parse the error message?
- What information helps you generate a better fix?
**Logic errors**:
- How do I detect these (they don't produce error messages)?
- How should I describe the problem to you?
Show me the exact prompts I should use for each type.
```
**What you're learning**: How to design prompts that help AI generate fixes, not just resuggest the same broken code.
### Prompt 2: Implement Convergence Testing
```
My specification requires these success criteria:
[PASTE YOUR CRITERIA FROM YOUR SPEC]
I need a function that validates whether code output satisfies these criteria.
For each criterion, what should the validation check?
- How do I verify the output format is correct?
- How do I verify all required fields are present?
- How do I detect if the output is incomplete or wrong?
Show me a Python function that validates all criteria and returns
which ones passed, which ones failed, and what's missing.
```
**What you're learning**: How to translate specification requirements into automated validation that tells you exactly when to stop iterating.
### Prompt 3: Test Your Skill with Intentional Failures
```
I want to test my skill's error recovery. Help me design test cases:
**Test Case 1: Missing column**
- Create CSV data where a required column is missing
- Show me what error the generated code will produce
- What should my skill do to recover?
**Test Case 2: Wrong data type**
- Create data where a numeric column contains text
- Show the error this produces
- How should the skill fix this?
**Test Case 3: Timeout scenario**
- What operation would cause a timeout?
- How should my skill handle timeouts gracefully?
For each test case, show me:
1. The test data
2. The error produced
3. How my skill should recover
```
**What you're learning**: Testing is not about success cases—it's about understanding how your skill behaves when things break.
### Prompt 4: Validate Convergence Against Diverse Inputs
```
My skill has processed the following test scenarios:
**Test 1 - Clean data**: PASSED
**Test 2 - Missing column**: RECOVERED (3 iterations)
**Test 3 - Empty file**: FAILED (non-recoverable)
**Test 4 - Malformed values**: RECOVERED (2 iterations)
Based on these results:
- Is my skill ready for production?
- What patterns suggest robustness?
- What edge cases might still break it?
- What should I test next?
Help me evaluate the skill's readiness.
```
**What you're learning**: Testing isn't a binary pass/fail. It's about understanding your skill's behavior patterns and building confidence in its robustness.
---
## Success Criteria
Your skill is complete when:
✓ **Specification is clear and complete** — AI can generate code from it without asking questions
✓ **Code executes successfully on clean data** — Happy path works
✓ **Error recovery works** — Syntax and runtime errors trigger fixes
✓ **Convergence is detected** — Skill stops when spec is satisfied
✓ **Edge cases are handled** — Tested with malformed, empty, large data
✓ **Iteration limits work** — Skill stops after 5 attempts or timeout
✓ **Skill is documented** — Someone else could use it
Your skill will become a reusable component in Lesson 7 (orchestration) when you combine it with MCP-wrapping skills to create complete workflows.
---
**Takeaway**: You didn't just learn the write-execute-analyze loop—you built a skill that automates it. You discovered that error recovery isn't magic; it's specification clarity + intelligent prompting + convergence validation. In Lesson 7, you'll orchestrate this skill with MCP-wrapping skills to build complex workflows that combine code execution with external tools.Related Skills
claude-typescript-sdk
Build AI applications with the Anthropic TypeScript SDK. Use when creating Claude integrations, building agents, implementing tool use, streaming responses, or working with the @anthropic-ai/sdk package.
claude-agent-sdk-builder
Guide for building agents with the Claude Agent SDK (TypeScript/Node.js). Use when creating SDK-based agents, custom tools, in-code subagents, or production agent applications. Provides templates, patterns, and best practices for agent development.
chatgpt-app-builder
Build ChatGPT apps with interactive widgets using mcp-use and OpenAI Apps SDK. Use when creating ChatGPT apps, building MCP servers with widgets, defining React widgets, working with Apps SDK, or when user mentions ChatGPT widgets, mcp-use widgets, or Apps SDK development.
building-streamlit-custom-components-v2
Builds bidirectional Streamlit Custom Components v2 (CCv2) using `st.components.v2.component`. Use when authoring inline HTML/CSS/JS components or packaged components (manifest `asset_dir`, js/css globs), wiring state/trigger callbacks, theming via `--st-*` CSS variables, or bundling with Vite / `component-template` v2.
building-chatgpt-apps
Guides creation of ChatGPT Apps with interactive widgets using the Apps SDK and MCP servers. Use when building ChatGPT custom apps with visual UI components, embedded widgets, or rich interactive experiences. Covers widget architecture, MCP server setup with FastMCP, response metadata, and Developer Mode configuration. NOT when building standard MCP servers without widgets (use building-mcp-servers skill instead).
build-things
Build software features end to end in an existing repository. Use when the user asks to build, implement, add, create, wire up, or ship code changes, including backend, frontend, APIs, and automation tasks.
build-app-step01
Use when users are building or scaling ChatGPT Apps / Apps SDK / MCP-based apps and want a preventive workflow to avoid common pitfalls before implementation, deployment, and growth. Trigger for requests about best practices, preflight checks, guardrails, checklists, workflow SOP, reliability, evals, and production readiness.
build-and-test
Build, test, lint, and validate the Phoenix Agentic Website Frontend. Use when user asks to build, compile, test, lint, run checks, fix test failures, or validate changes in the Website Frontend repo.
build-agent-python
Python build agent for scripts, backends, data pipelines, and ML projects. Extends build-agent with Python conventions. Use when building Python applications, APIs, data processing, or automation.
bio-epitranscriptomics-modification-visualization
Create metagene plots and browser tracks for RNA modification data. Use when visualizing m6A distribution patterns around genomic features like stop codons.
bazel-build-optimization
Optimize Bazel builds for large-scale monorepos. Use when configuring Bazel, implementing remote execution, or optimizing build performance for enterprise codebases.
backup-script-gen
Generate database backup scripts with AI. Use when you need automated backups to S3, GCS, or local storage.