Chaos Engineering

Design and execute controlled failure experiments to validate system resilience

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

Chaos Engineering is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Design and execute controlled failure experiments to validate system resilience

Teams using Chaos Engineering should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/chaos-engineering/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/design/chaos-engineering/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/chaos-engineering/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How Chaos Engineering Compares

Feature / Agent	Chaos Engineering	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Design and execute controlled failure experiments to validate system resilience

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Chaos Engineering Skill

Design and execute controlled failure experiments to validate system resilience.

## Trigger Conditions
- Pre-release resilience validation needed
- Post-deploy verification of fault tolerance
- User invokes with "chaos experiment" or "resilience test"

## Input Contract
- **Required:** System under test
- **Required:** Steady-state hypothesis (measurable)
- **Optional:** Blast radius constraints, failure types to inject

## Output Contract
- Experiment definition with hypothesis
- Results report with pass/fail
- Findings and remediation recommendations
- Updated resilience scorecard

## Tool Permissions
- **Read:** Service configs, circuit breaker configs, monitoring dashboards
- **Write:** Experiment logs, findings reports
- **Execute:** Failure injection tools (network, compute, storage)

## Execution Steps
1. Define steady-state hypothesis with measurable metrics
2. Select failure injection type (network, pod kill, CPU, disk, dependency)
3. Constrain blast radius (start small: single pod, single AZ)
4. Execute experiment while monitoring steady state
5. Observe and record system behavior
6. Compare actual behavior against hypothesis
7. Document findings and remediation

## Success Criteria
- Hypothesis clearly defined before experiment
- Blast radius contained as planned
- Monitoring remained functional during experiment
- Findings documented with severity and remediation

## Escalation Rules
- Escalate if experiment causes unexpected customer impact
- Escalate if monitoring fails during the experiment
- Escalate if recovery takes longer than MTTR target

## Example Invocations

**Input:** "Test what happens when the Redis cache becomes unavailable"

**Output:** Hypothesis: API latency stays <500ms p99 with cache miss fallback to DB. Experiment: kill Redis pod. Result: FAIL — latency spiked to 3.2s, circuit breaker did not trip (misconfigured threshold). Remediation: lower circuit breaker threshold from 50% to 20% error rate, add cache stampede protection.

Related Skills

chaos-engineering-fundamentals

from diegosouzapw/awesome-omni-skill

Use when implementing chaos engineering, designing fault injection experiments, or building resilience testing practices. Covers chaos principles and experiment design.

Prompt Engineering Skill

from diegosouzapw/awesome-omni-skill

Craft effective prompts that get the best results from language models.

prompt-engineering-openai-api-f7c24501

from diegosouzapw/awesome-omni-skill

data-engineering-data-pipeline

from diegosouzapw/awesome-omni-skill

You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.

context-engineering

from diegosouzapw/awesome-omni-skill

Use when designing agent system prompts, optimizing RAG retrieval, or when context is too expensive or slow. Reduces tokens while maintaining quality through strategic positioning and attention-aware design.

Build Your Data Engineering Skill

from diegosouzapw/awesome-omni-skill

Create your LLMOps data engineering skill in one prompt, then learn to improve it throughout the chapter

ai-engineering-skill

from diegosouzapw/awesome-omni-skill

Practical guide for building production ML systems based on Chip Huyen's AI Engineering book. Use when users ask about model evaluation, deployment strategies, monitoring, data pipelines, feature engineering, cost optimization, or MLOps. Covers metrics, A/B testing, serving patterns, drift detection, and production best practices.

ai-data-engineering

from diegosouzapw/awesome-omni-skill

Data pipelines, feature stores, and embedding generation for AI/ML systems. Use when building RAG pipelines, ML feature serving, or data transformations. Covers feature stores (Feast, Tecton), embedding pipelines, chunking strategies, orchestration (Dagster, Prefect, Airflow), dbt transformations, data versioning (LakeFS), and experiment tracking (MLflow, W&B).

Data Engineering Data Driven Feature

from diegosouzapw/awesome-omni-skill

World-class data science skill for statistical modeling, experimentation, causal inference, and advanced analytics. Expertise in Python (NumPy, Pandas, Scikit-learn), R, SQL, statistical methods, A/B testing, time series, and business intelligence. Includes experiment design, feature engineering, model evaluation, and stakeholder communication.

ai-marketing-engineering

from diegosouzapw/awesome-omni-skill

AI-powered marketing engineering skill based on Alon Huri's framework. Transforms marketing from copywriting to engineering discipline through 10 agentic mechanisms: infinite creative generation, adaptive budget management, LTV signal hunting, contextual data layers, AEO optimization, dynamic quizzes, behavior-driven activation, personalized video at scale, competitor weakness targeting, and active churn prevention. Use when building marketing automation systems, designing growth engineering workflows, creating AI-powered marketing agents, optimizing ad creatives at scale, implementing AEO (Answer Engine Optimization), or architecting data-driven marketing infrastructure.

u0542-engineering-multi-agent-negotiation-mediator

from diegosouzapw/awesome-omni-skill

Operate the "Engineering Multi-Agent Negotiation Mediator" capability in production for workflows. Use when mission execution explicitly requires this capability and outcomes must be reproducible, policy-gated, and handoff-ready.

prompt-engineering

from diegosouzapw/awesome-omni-skill

Write effective prompts for AI coding agents. Use when crafting system prompts, implementing chain-of-thought reasoning, building few-shot examples, adding guardrails, configuring tool use, or designing agentic prompt patterns. Covers CoT, few-shot, guardrails, and function calling.