conducting-chaos-engineering

This skill enables Claude to design and execute chaos engineering experiments to test system resilience. It is used when the user requests help with failure injection, latency simulation, resource exhaustion testing, or resilience validation. The skill is triggered by discussions of chaos experiments (GameDays), failure injection strategies, resilience testing, and validation of recovery mechanisms like circuit breakers and retry logic. It leverages tools like Chaos Mesh, Gremlin, Toxiproxy, and AWS FIS to simulate real-world failures and assess system behavior.

1,868 stars

Best use case

conducting-chaos-engineering is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

This skill enables Claude to design and execute chaos engineering experiments to test system resilience. It is used when the user requests help with failure injection, latency simulation, resource exhaustion testing, or resilience validation. The skill is triggered by discussions of chaos experiments (GameDays), failure injection strategies, resilience testing, and validation of recovery mechanisms like circuit breakers and retry logic. It leverages tools like Chaos Mesh, Gremlin, Toxiproxy, and AWS FIS to simulate real-world failures and assess system behavior.

Teams using conducting-chaos-engineering should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/chaos-engineering-toolkit/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/backups/skill-structure-cleanup-20251108-073936/plugins/testing/chaos-engineering-toolkit/skills/chaos-engineering-toolkit/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/chaos-engineering-toolkit/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How conducting-chaos-engineering Compares

Feature / Agentconducting-chaos-engineeringStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

This skill enables Claude to design and execute chaos engineering experiments to test system resilience. It is used when the user requests help with failure injection, latency simulation, resource exhaustion testing, or resilience validation. The skill is triggered by discussions of chaos experiments (GameDays), failure injection strategies, resilience testing, and validation of recovery mechanisms like circuit breakers and retry logic. It leverages tools like Chaos Mesh, Gremlin, Toxiproxy, and AWS FIS to simulate real-world failures and assess system behavior.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

## Overview

This skill empowers Claude to act as a chaos engineering specialist, guiding users through the process of designing and implementing controlled failure scenarios to identify weaknesses and improve the robustness of their systems. It facilitates the creation of chaos experiments to validate system resilience and recovery mechanisms.

## How It Works

1. **Experiment Design**: Claude helps define the scope, target system, and failure scenarios for the chaos experiment based on the user's objectives.
2. **Tool Selection**: Claude recommends appropriate chaos engineering tools (e.g., Chaos Mesh, Gremlin, Toxiproxy, AWS FIS) based on the target environment and desired failure types.
3. **Execution and Monitoring**: Claude assists with configuring and executing the chaos experiment, while monitoring key metrics to observe system behavior under stress.
4. **Analysis and Recommendations**: Claude analyzes the results of the experiment, identifies vulnerabilities, and provides recommendations for improving system resilience.

## When to Use This Skill

This skill activates when you need to:
- Design a chaos experiment to test the resilience of a specific service or application.
- Implement failure injection strategies to simulate real-world outages.
- Validate the effectiveness of circuit breakers and retry mechanisms.
- Analyze system behavior under stress and identify potential vulnerabilities.

## Examples

### Example 1: Database Failover Testing

User request: "Help me design a chaos experiment to test our database failover process."

The skill will:
1. Design a chaos experiment involving simulated database failures and automated failover.
2. Recommend using Chaos Mesh for Kubernetes environments or AWS FIS for AWS-hosted databases.

### Example 2: API Latency Simulation

User request: "Create a latency injection test for our API gateway to simulate network congestion."

The skill will:
1. Design a latency injection test using Toxiproxy to introduce delays in API requests.
2. Monitor API response times and error rates to assess the impact of latency.

## Best Practices

- **Define Clear Objectives**: Clearly define the goals of the chaos experiment and the specific system behavior you want to test.
- **Start Small**: Begin with small-scale experiments and gradually increase the scope and intensity of the failures.
- **Automate and Monitor**: Automate the execution and monitoring of chaos experiments to ensure repeatability and accurate data collection.

## Integration

This skill integrates with various chaos engineering tools, allowing Claude to orchestrate failure injection, latency simulation, and resource exhaustion testing across different environments. It can also be used in conjunction with monitoring tools to track system behavior and identify potential vulnerabilities.

Related Skills

running-chaos-tests

1868
from jeremylongshore/claude-code-plugins-plus-skills

Execute chaos engineering experiments to test system resilience. Use when performing specialized testing. Trigger with phrases like "run chaos tests", "test resilience", or "inject failures".

engineering-features-for-machine-learning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Execute create, select, and transform features to improve machine learning model performance. Handles feature scaling, encoding, and importance analysis. Use when asked to "engineer features" or "select features". Trigger with relevant phrases based on skill purpose.

feature-engineering-helper

1868
from jeremylongshore/claude-code-plugins-plus-skills

Feature Engineering Helper - Auto-activating skill for ML Training. Triggers on: feature engineering helper, feature engineering helper Part of the ML Training skill category.

conducting-browser-compatibility-tests

1868
from jeremylongshore/claude-code-plugins-plus-skills

This skill enables cross-browser compatibility testing for web applications using BrowserStack, Selenium Grid, or Playwright. It tests across Chrome, Firefox, Safari, and Edge, identifying browser-specific bugs and ensuring consistent functionality. It is used when a user requests to "test browser compatibility", "run cross-browser tests", or uses the `/browser-test` or `/bt` command to assess web application behavior across different browsers and devices. The skill generates a report detailing compatibility issues and screenshots for visual verification. Activates when you request "conducting browser compatibility tests" functionality.

schema-optimization-orchestrator

1868
from jeremylongshore/claude-code-plugins-plus-skills

Multi-phase schema optimization workflow orchestrator. Creates session directories, spawns phase agents sequentially, validates outputs, aggregates results. Trigger: "run schema optimization", "optimize schema workflow", "execute schema phases"

test-skill

1868
from jeremylongshore/claude-code-plugins-plus-skills

Test skill for E2E validation. Trigger with "run test skill" or "execute test". Use this skill when testing skill activation and tool permissions.

example-skill

1868
from jeremylongshore/claude-code-plugins-plus-skills

Brief description of what this skill does and when the model should activate it. Use when [describe the user's intent or situation]. Trigger with "example phrase", "another trigger", "/example-skill".

testing-visual-regression

1868
from jeremylongshore/claude-code-plugins-plus-skills

Detect visual changes in UI components using screenshot comparison. Use when detecting unintended UI changes or pixel differences. Trigger with phrases like "test visual changes", "compare screenshots", or "detect UI regressions".

generating-unit-tests

1868
from jeremylongshore/claude-code-plugins-plus-skills

Test automatically generate comprehensive unit tests from source code covering happy paths, edge cases, and error conditions. Use when creating test coverage for functions, classes, or modules. Trigger with phrases like "generate unit tests", "create tests for", or "add test coverage".

generating-test-reports

1868
from jeremylongshore/claude-code-plugins-plus-skills

Generate comprehensive test reports with metrics, coverage, and visualizations. Use when performing specialized testing. Trigger with phrases like "generate test report", "create test documentation", or "show test metrics".

orchestrating-test-execution

1868
from jeremylongshore/claude-code-plugins-plus-skills

Test coordinate parallel test execution across multiple environments and frameworks. Use when performing specialized testing. Trigger with phrases like "orchestrate tests", "run parallel tests", or "coordinate test execution".

managing-test-environments

1868
from jeremylongshore/claude-code-plugins-plus-skills

Test provision and manage isolated test environments with configuration and data. Use when performing specialized testing. Trigger with phrases like "manage test environment", "provision test env", or "setup test infrastructure".