Chaos Engineering
Design and execute controlled failure experiments to validate system resilience
Best use case
Chaos Engineering is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Design and execute controlled failure experiments to validate system resilience
Teams using Chaos Engineering should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/chaos-engineering/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How Chaos Engineering Compares
| Feature / Agent | Chaos Engineering | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Design and execute controlled failure experiments to validate system resilience
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Chaos Engineering Skill Design and execute controlled failure experiments to validate system resilience. ## Trigger Conditions - Pre-release resilience validation needed - Post-deploy verification of fault tolerance - User invokes with "chaos experiment" or "resilience test" ## Input Contract - **Required:** System under test - **Required:** Steady-state hypothesis (measurable) - **Optional:** Blast radius constraints, failure types to inject ## Output Contract - Experiment definition with hypothesis - Results report with pass/fail - Findings and remediation recommendations - Updated resilience scorecard ## Tool Permissions - **Read:** Service configs, circuit breaker configs, monitoring dashboards - **Write:** Experiment logs, findings reports - **Execute:** Failure injection tools (network, compute, storage) ## Execution Steps 1. Define steady-state hypothesis with measurable metrics 2. Select failure injection type (network, pod kill, CPU, disk, dependency) 3. Constrain blast radius (start small: single pod, single AZ) 4. Execute experiment while monitoring steady state 5. Observe and record system behavior 6. Compare actual behavior against hypothesis 7. Document findings and remediation ## Success Criteria - Hypothesis clearly defined before experiment - Blast radius contained as planned - Monitoring remained functional during experiment - Findings documented with severity and remediation ## Escalation Rules - Escalate if experiment causes unexpected customer impact - Escalate if monitoring fails during the experiment - Escalate if recovery takes longer than MTTR target ## Example Invocations **Input:** "Test what happens when the Redis cache becomes unavailable" **Output:** Hypothesis: API latency stays <500ms p99 with cache miss fallback to DB. Experiment: kill Redis pod. Result: FAIL — latency spiked to 3.2s, circuit breaker did not trip (misconfigured threshold). Remediation: lower circuit breaker threshold from 50% to 20% error rate, add cache stampede protection.
Related Skills
chaos-engineering-fundamentals
Use when implementing chaos engineering, designing fault injection experiments, or building resilience testing practices. Covers chaos principles and experiment design.
Prompt Engineering Skill
Craft effective prompts that get the best results from language models.
prompt-engineering-openai-api-f7c24501
Log in [Sign up](https://platform.openai.com/signup)
data-engineering-data-pipeline
You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.
context-engineering
Use when designing agent system prompts, optimizing RAG retrieval, or when context is too expensive or slow. Reduces tokens while maintaining quality through strategic positioning and attention-aware design.
Build Your Data Engineering Skill
Create your LLMOps data engineering skill in one prompt, then learn to improve it throughout the chapter
ai-engineering-skill
Practical guide for building production ML systems based on Chip Huyen's AI Engineering book. Use when users ask about model evaluation, deployment strategies, monitoring, data pipelines, feature engineering, cost optimization, or MLOps. Covers metrics, A/B testing, serving patterns, drift detection, and production best practices.
ai-data-engineering
Data pipelines, feature stores, and embedding generation for AI/ML systems. Use when building RAG pipelines, ML feature serving, or data transformations. Covers feature stores (Feast, Tecton), embedding pipelines, chunking strategies, orchestration (Dagster, Prefect, Airflow), dbt transformations, data versioning (LakeFS), and experiment tracking (MLflow, W&B).
Data Engineering Data Driven Feature
World-class data science skill for statistical modeling, experimentation, causal inference, and advanced analytics. Expertise in Python (NumPy, Pandas, Scikit-learn), R, SQL, statistical methods, A/B testing, time series, and business intelligence. Includes experiment design, feature engineering, model evaluation, and stakeholder communication.
ai-marketing-engineering
AI-powered marketing engineering skill based on Alon Huri's framework. Transforms marketing from copywriting to engineering discipline through 10 agentic mechanisms: infinite creative generation, adaptive budget management, LTV signal hunting, contextual data layers, AEO optimization, dynamic quizzes, behavior-driven activation, personalized video at scale, competitor weakness targeting, and active churn prevention. Use when building marketing automation systems, designing growth engineering workflows, creating AI-powered marketing agents, optimizing ad creatives at scale, implementing AEO (Answer Engine Optimization), or architecting data-driven marketing infrastructure.
u0542-engineering-multi-agent-negotiation-mediator
Operate the "Engineering Multi-Agent Negotiation Mediator" capability in production for workflows. Use when mission execution explicitly requires this capability and outcomes must be reproducible, policy-gated, and handoff-ready.
prompt-engineering
Write effective prompts for AI coding agents. Use when crafting system prompts, implementing chain-of-thought reasoning, building few-shot examples, adding guardrails, configuring tool use, or designing agentic prompt patterns. Covers CoT, few-shot, guardrails, and function calling.