Error Recovery Patterns Skill

This skill provides comprehensive guidance on error handling patterns, recovery strategies, and debugging techniques in GitHub Agentic Workflows (gh-aw).

4,265 stars

Best use case

Error Recovery Patterns Skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

This skill provides comprehensive guidance on error handling patterns, recovery strategies, and debugging techniques in GitHub Agentic Workflows (gh-aw).

Teams using Error Recovery Patterns Skill should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/error-recovery-patterns/SKILL.md --create-dirs "https://raw.githubusercontent.com/github/gh-aw/main/skills/error-recovery-patterns/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/error-recovery-patterns/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Error Recovery Patterns Skill Compares

Feature / AgentError Recovery Patterns SkillStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

This skill provides comprehensive guidance on error handling patterns, recovery strategies, and debugging techniques in GitHub Agentic Workflows (gh-aw).

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Error Recovery Patterns Skill

This skill provides comprehensive guidance on error handling patterns, recovery strategies, and debugging techniques in GitHub Agentic Workflows (gh-aw).

## Purpose

Guide developers in implementing robust error recovery patterns to:
- Reduce retry loops in agent sessions (target: <10% vs current 23%)
- Implement circuit breakers to prevent infinite retry loops
- Add proactive recovery for installation, dependency, and API failures
- Improve debug logging for recovery attempts

## When to Use This Skill

Invoke this skill when:
- Implementing retry logic for network operations, installations, or API calls
- Debugging retry loop issues in workflows or agent sessions
- Adding error recovery patterns to new or existing code
- Understanding transient vs non-transient error classification
- Implementing circuit breakers or exponential backoff
- Adding debug logging for recovery attempts

## Key Concepts Covered

### 1. Circuit Breaker Pattern
- Maximum retry limits (standard: 3 attempts)
- Exponential backoff strategies
- Fail-fast on non-transient errors
- Implementation in JavaScript, Shell, and Go

### 2. Installation Failure Recovery
- NPM installation with cache clearing and registry fallbacks
- Python pip installation with mirror alternatives
- Docker image pull with retry and rate limit handling
- Copilot CLI installation with network retry

### 3. API Timeout and Rate Limit Handling
- GitHub API rate limit detection and backoff
- Transient error detection patterns
- Custom retry configuration for different APIs
- Rate limit-specific retry strategies

### 4. Debug Logging for Recovery
- Logger package usage for retry attempts
- Category naming conventions (pkg:filename)
- DEBUG environment variable patterns
- Zero-overhead logging when disabled

### 5. Error Categorization
- Transient vs non-transient errors
- Network errors, timeout patterns
- HTTP error codes (502, 503, 504)
- GitHub-specific errors (rate limits, abuse detection)

## Anti-Patterns to Avoid

This skill explicitly covers anti-patterns to avoid:
- ❌ Infinite retry loops without maximum limits
- ❌ Retrying validation errors that won't self-correct
- ❌ No backoff delay between attempts
- ❌ Silent retries without logging
- ❌ Retrying non-transient errors

## Code Examples Provided

The skill includes production-ready examples for:
- JavaScript retry with `withRetry()` function
- Shell script retry loops with exponential backoff
- Go retry patterns with context and timeouts
- NPM/pip/docker installation recovery
- GitHub API rate limit handling
- Debug logging for all recovery attempts

## Related Skills

- **error-messages** - Error message formatting and style guide
- **error-pattern-safety** - Safety guidelines for error pattern regex
- **developer** - General development guidelines and conventions

## Full Documentation

Complete documentation available at: `../../scratchpad/error-recovery-patterns.md`

This skill references the comprehensive error recovery patterns document which includes:
- Console formatting requirements
- Error wrapping patterns
- Common error scenarios with step-by-step resolution
- Error message templates
- Debugging runbook
- Error categorization decision trees
- Metrics and monitoring strategies