running-performance-tests

Execute load testing, stress testing, and performance benchmarking. Use when performing specialized testing. Trigger with phrases like "run load tests", "test performance", or "benchmark the system".

1,868 stars

byjeremylongshore

View on GitHub Installation ↓

Best use case

running-performance-tests is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Execute load testing, stress testing, and performance benchmarking. Use when performing specialized testing. Trigger with phrases like "run load tests", "test performance", or "benchmark the system".

Teams using running-performance-tests should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/running-performance-tests/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/testing/performance-test-suite/skills/running-performance-tests/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/running-performance-tests/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How running-performance-tests Compares

Feature / Agent	running-performance-tests	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Execute load testing, stress testing, and performance benchmarking. Use when performing specialized testing. Trigger with phrases like "run load tests", "test performance", or "benchmark the system".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

SKILL.md Source

# Performance Test Suite

## Overview

Execute load testing, stress testing, and performance benchmarking to identify bottlenecks, establish baseline metrics, and verify SLA compliance. Supports k6 (recommended), Artillery, Apache JMeter, Locust (Python), and autocannon (Node.js).

## Prerequisites

- Performance testing tool installed (`k6`, `artillery`, `locust`, `jmeter`, or `autocannon`)
- Target application deployed in a production-like environment (not local dev)
- Baseline performance metrics or SLA targets (e.g., p95 < 200ms, 99.9% availability)
- Monitoring stack accessible (Grafana, CloudWatch, Datadog) for resource metrics during tests
- Test data sufficient to avoid cache-only responses

## Instructions

1. Define performance test scenarios based on production traffic patterns:
   - **Load test**: Simulate expected peak traffic (e.g., 500 concurrent users for 10 minutes).
   - **Stress test**: Ramp beyond expected capacity to find the breaking point.
   - **Spike test**: Sudden burst of traffic (0 to 1000 users in 10 seconds).
   - **Soak test**: Sustained moderate load for extended duration (1-4 hours) to detect memory leaks.
2. Create test scripts targeting critical endpoints:
   - Identify the top 5-10 most-hit API endpoints from production access logs.
   - Include both read (GET) and write (POST/PUT/DELETE) operations.
   - Simulate realistic user behavior with think time between requests.
   - Use parameterized data to avoid cache-only hits (randomize query parameters, user IDs).
3. Configure load profiles:
   - Define virtual user (VU) ramp-up stages (e.g., 10 VUs for 1 minute, then 50 VUs for 5 minutes).
   - Set test duration appropriate to the scenario (load: 10-15 min, soak: 1-4 hours).
   - Configure request timeouts matching production settings.
4. Execute the performance test:
   - Run from a machine with sufficient network bandwidth and CPU.
   - Avoid running from the same host as the application under test.
   - Monitor application metrics (CPU, memory, DB connections) during execution.
5. Analyze results against SLA thresholds:
   - p50, p90, p95, p99 response times.
   - Requests per second (throughput).
   - Error rate (target: < 0.1% for load test, higher tolerance for stress test).
   - Resource utilization (CPU < 80%, memory < 85% at peak load).
6. Identify and document bottlenecks:
   - Slow database queries (check slow query logs).
   - CPU-bound operations (profiling data).
   - Memory leaks (growing RSS over soak test).
   - Connection pool exhaustion (database or HTTP client).
7. Generate a performance report with visualizations and recommendations.

## Output

- Performance test scripts (k6 `.js`, Artillery `.yml`, or Locust `.py` files)
- Execution results with response time percentiles, throughput, and error rates
- Performance report comparing results against SLA thresholds
- Bottleneck analysis with specific recommendations
- CI integration configuration for automated performance regression detection

## Error Handling

| Error | Cause | Solution |
|-------|-------|---------|
| Connection reset by peer | Server or load balancer dropping connections under load | Check max connections settings; increase connection pool size; verify keep-alive configuration |
| Timeouts spike at certain VU count | Application thread pool or database connection pool exhausted | Profile connection usage; increase pool size; add connection queuing; optimize slow queries |
| Inconsistent results between runs | Cache warming, garbage collection pauses, or noisy neighbor effects | Run a warm-up phase before measurement; use dedicated test infrastructure; average across 3 runs |
| Load generator CPU maxed out | Single machine cannot generate sufficient load | Distribute load generation across multiple machines; use cloud-based load generation services |
| All requests return cached responses | Test data not sufficiently varied | Randomize request parameters; use unique IDs per request; disable CDN caching for test environment |

## Examples

**k6 load test script:**
```javascript
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 50 },   // Ramp up
    { duration: '5m', target: 50 },   // Sustained load
    { duration: '2m', target: 200 },  // Stress  # HTTP 200 OK
    { duration: '1m', target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<200', 'p(99)<500'],  # 500: HTTP 200 OK
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const res = http.get('https://api.test.com/products');
  check(res, {
    'status is 200': (r) => r.status === 200,  # HTTP 200 OK
    'response time OK': (r) => r.timings.duration < 300,  # 300: timeout: 5 minutes
  });
  sleep(1); // Think time
}
```

**Artillery test configuration:**
```yaml
config:
  target: "https://api.test.com"
  phases:
    - duration: 120
      arrivalRate: 10
      name: "Warm up"
    - duration: 300  # 300: timeout: 5 minutes
      arrivalRate: 50
      name: "Sustained load"
  ensure:
    p95: 200  # HTTP 200 OK
    maxErrorRate: 1
scenarios:
  - flow:
      - get:
          url: "/api/products"
      - think: 1
      - post:
          url: "/api/cart"
          json: { productId: "{{ $randomString() }}" }
```

## Resources

- k6 documentation: https://grafana.com/docs/k6/latest/
- Artillery: https://www.artillery.io/docs
- Locust (Python): https://docs.locust.io/
- Apache JMeter: https://jmeter.apache.org/
- autocannon (Node.js): https://github.com/mcollina/autocannon
- Performance testing best practices: https://grafana.com/blog/2024/01/30/load-testing-best-practices/

Related Skills

generating-unit-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Test automatically generate comprehensive unit tests from source code covering happy paths, edge cases, and error conditions. Use when creating test coverage for functions, classes, or modules. Trigger with phrases like "generate unit tests", "create tests for", or "add test coverage".

managing-snapshot-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Create and validate component snapshots for UI regression testing. Use when performing specialized testing. Trigger with phrases like "update snapshots", "test UI snapshots", or "validate component snapshots".

running-smoke-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Execute fast smoke tests validating critical functionality after deployment. Use when performing specialized testing. Trigger with phrases like "run smoke tests", "quick validation", or "test critical paths".

tracking-regression-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Track and manage regression test suites across releases. Use when performing specialized testing. Trigger with phrases like "track regressions", "manage regression suite", or "validate against baseline".

running-mutation-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Execute mutation testing to evaluate test suite effectiveness. Use when performing specialized testing. Trigger with phrases like "run mutation tests", "test the tests", or "validate test effectiveness".

running-integration-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Execute integration tests validating component interactions and system integration. Use when performing specialized testing. Trigger with phrases like "run integration tests", "test integration", or "validate component interactions".

running-e2e-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Execute end-to-end tests covering full user workflows across frontend and backend. Use when performing specialized testing. Trigger with phrases like "run end-to-end tests", "test user flows", or "execute E2E suite".

managing-database-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Test database testing including fixtures, transactions, and rollback management. Use when performing specialized testing. Trigger with phrases like "test the database", "run database tests", or "validate data integrity".

running-chaos-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Execute chaos engineering experiments to test system resilience. Use when performing specialized testing. Trigger with phrases like "run chaos tests", "test resilience", or "inject failures".

workhuman-performance-tuning

1868

from jeremylongshore/claude-code-plugins-plus-skills

Workhuman performance tuning for employee recognition and rewards API. Use when integrating Workhuman Social Recognition, or building recognition workflows with HRIS systems. Trigger: "workhuman performance tuning".

wispr-performance-tuning

1868

from jeremylongshore/claude-code-plugins-plus-skills

Wispr Flow performance tuning for voice-to-text API integration. Use when integrating Wispr Flow dictation, WebSocket streaming, or building voice-powered applications. Trigger: "wispr performance tuning".

windsurf-performance-tuning

1868

from jeremylongshore/claude-code-plugins-plus-skills

Optimize Windsurf IDE performance: indexing speed, Cascade responsiveness, and memory usage. Use when Windsurf is slow, indexing takes too long, Cascade times out, or the IDE uses too much memory. Trigger with phrases like "windsurf slow", "windsurf performance", "optimize windsurf", "windsurf memory", "cascade slow", "indexing slow".