performance-testing

Profiles application performance under load using k6, Artillery, or JMeter to measure latency, throughput, and error rates. Use when planning load tests, stress tests, soak tests, benchmarking APIs, or identifying performance bottlenecks.

298 stars

byproffesor-for-testing

View on GitHub Installation ↓

Best use case

performance-testing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using performance-testing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/performance-testing/SKILL.md --create-dirs "https://raw.githubusercontent.com/proffesor-for-testing/agentic-qe/main/.claude/skills/performance-testing/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/performance-testing/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How performance-testing Compares

Feature / Agent	performance-testing	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Performance Testing

<default_to_action>
When testing performance or planning load tests:
1. DEFINE SLOs: p95 response time, throughput, error rate targets
2. IDENTIFY critical paths: revenue flows, high-traffic pages, key APIs
3. CREATE realistic scenarios: user journeys, think time, varied data
4. EXECUTE with monitoring: CPU, memory, DB queries, network
5. ANALYZE bottlenecks and fix before production

**Quick Test Type Selection:**
- Expected load validation → Load testing
- Find breaking point → Stress testing
- Sudden traffic spike → Spike testing
- Memory leaks, resource exhaustion → Endurance/soak testing
- Horizontal/vertical scaling → Scalability testing

**Critical Success Factors:**
- Performance is a feature, not an afterthought
- Test early and often, not just before release
- Focus on user-impacting bottlenecks
</default_to_action>

## Quick Reference Card

### When to Use
- Before major releases
- After infrastructure changes
- Before scaling events (Black Friday)
- When setting SLAs/SLOs

### Test Types
| Type | Purpose | When |
|------|---------|------|
| **Load** | Expected traffic | Every release |
| **Stress** | Beyond capacity | Quarterly |
| **Spike** | Sudden surge | Before events |
| **Endurance** | Memory leaks | After code changes |
| **Scalability** | Scaling validation | Infrastructure changes |

### Key Metrics
| Metric | Target | Why |
|--------|--------|-----|
| p95 response | < 200ms | User experience |
| Throughput | 10k req/min | Capacity |
| Error rate | < 0.1% | Reliability |
| CPU | < 70% | Headroom |
| Memory | < 80% | Stability |

### Tools
- **k6**: Modern, JS-based, CI/CD friendly
- **JMeter**: Enterprise, feature-rich
- **Artillery**: Simple YAML configs
- **Gatling**: Scala, great reporting

### Agent Coordination
- `qe-performance-tester`: Load test orchestration
- `qe-quality-analyzer`: Results analysis
- `qe-production-intelligence`: Production comparison

---

## Defining SLOs

**Bad:** "The system should be fast"
**Good:** "p95 response time < 200ms under 1,000 concurrent users"

```javascript
export const options = {
  thresholds: {
    http_req_duration: ['p(95)<200'],  // 95% < 200ms
    http_req_failed: ['rate<0.01'],     // < 1% failures
  },
};
```

---

## Realistic Scenarios

**Bad:** Every user hits homepage repeatedly
**Good:** Model actual user behavior

```javascript
// Realistic distribution
// 40% browse, 30% search, 20% details, 10% checkout
export default function () {
  const action = Math.random();
  if (action < 0.4) browse();
  else if (action < 0.7) search();
  else if (action < 0.9) viewProduct();
  else checkout();

  sleep(randomInt(1, 5)); // Think time
}
```

---

## Common Bottlenecks

### Database
**Symptoms:** Slow queries under load, connection pool exhaustion
**Fixes:** Add indexes, optimize N+1 queries, increase pool size, read replicas

### N+1 Queries
```javascript
// BAD: 100 orders = 101 queries
const orders = await Order.findAll();
for (const order of orders) {
  const customer = await Customer.findById(order.customerId);
}

// GOOD: 1 query
const orders = await Order.findAll({ include: [Customer] });
```

### Synchronous Processing
**Problem:** Blocking operations in request path (sending email during checkout)
**Fix:** Use message queues, process async, return immediately

### Memory Leaks
**Detection:** Endurance testing, memory profiling
**Common causes:** Event listeners not cleaned, caches without eviction

### External Dependencies
**Solutions:** Aggressive timeouts, circuit breakers, caching, graceful degradation

---

## k6 CI/CD Example

```javascript
// performance-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '1m', target: 50 },   // Ramp up
    { duration: '3m', target: 50 },   // Steady
    { duration: '1m', target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<200'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const res = http.get('https://api.example.com/products');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 200ms': (r) => r.timings.duration < 200,
  });
  sleep(1);
}
```

```yaml
# GitHub Actions
- name: Run k6 test
  uses: grafana/k6-action@v0.3.0
  with:
    filename: performance-test.js
```

---

## Analyzing Results

### Good Results
```
Load: 1,000 users | p95: 180ms | Throughput: 5,000 req/s
Error rate: 0.05% | CPU: 65% | Memory: 70%
```

### Problems
```
Load: 1,000 users | p95: 3,500ms ❌ | Throughput: 500 req/s ❌
Error rate: 5% ❌ | CPU: 95% ❌ | Memory: 90% ❌
```

### Root Cause Analysis
1. Correlate metrics: When response time spikes, what changes?
2. Check logs: Errors, warnings, slow queries
3. Profile code: Where is time spent?
4. Monitor resources: CPU, memory, disk
5. Trace requests: End-to-end flow

---

## Anti-Patterns

| ❌ Anti-Pattern | ✅ Better |
|----------------|-----------|
| Testing too late | Test early and often |
| Unrealistic scenarios | Model real user behavior |
| 0 to 1000 users instantly | Ramp up gradually |
| No monitoring during tests | Monitor everything |
| No baseline | Establish and track trends |
| One-time testing | Continuous performance testing |

---

## Agent-Assisted Performance Testing

```typescript
// Comprehensive load test
await Task("Load Test", {
  target: 'https://api.example.com',
  scenarios: {
    checkout: { vus: 100, duration: '5m' },
    search: { vus: 200, duration: '5m' },
    browse: { vus: 500, duration: '5m' }
  },
  thresholds: {
    'http_req_duration': ['p(95)<200'],
    'http_req_failed': ['rate<0.01']
  }
}, "qe-performance-tester");

// Bottleneck analysis
await Task("Analyze Bottlenecks", {
  testResults: perfTest,
  metrics: ['cpu', 'memory', 'db_queries', 'network']
}, "qe-performance-tester");

// CI integration
await Task("CI Performance Gate", {
  mode: 'smoke',
  duration: '1m',
  vus: 10,
  failOn: { 'p95_response_time': 300, 'error_rate': 0.01 }
}, "qe-performance-tester");
```

---

## Agent Coordination Hints

### Memory Namespace
```
aqe/performance/
├── results/*       - Test execution results
├── baselines/*     - Performance baselines
├── bottlenecks/*   - Identified bottlenecks
└── trends/*        - Historical trends
```

### Fleet Coordination
```typescript
const perfFleet = await FleetManager.coordinate({
  strategy: 'performance-testing',
  agents: [
    'qe-performance-tester',
    'qe-quality-analyzer',
    'qe-production-intelligence',
    'qe-deployment-readiness'
  ],
  topology: 'sequential'
});
```

---

## Pre-Production Checklist

- [ ] Load test passed (expected traffic)
- [ ] Stress test passed (2-3x expected)
- [ ] Spike test passed (sudden surge)
- [ ] Endurance test passed (24+ hours)
- [ ] Database indexes in place
- [ ] Caching configured
- [ ] Monitoring and alerting set up
- [ ] Performance baseline established

---

## Related Skills
- [agentic-quality-engineering](../agentic-quality-engineering/) - Agent coordination
- [api-testing-patterns](../api-testing-patterns/) - API performance
- [chaos-engineering-resilience](../chaos-engineering-resilience/) - Resilience testing

---

## Remember

**Performance is a feature:** Test it like functionality
**Test continuously:** Not just before launch
**Monitor production:** Synthetic + real user monitoring
**Fix what matters:** Focus on user-impacting bottlenecks
**Trend over time:** Catch degradation early

**With Agents:** Agents automate load testing, analyze bottlenecks, and compare with production. Use agents to maintain performance at scale.

## Run History

After each performance test run, append results to `run-history.json` in this skill directory:
```bash
node -e "
const fs = require('fs');
const h = JSON.parse(fs.readFileSync('.claude/skills/performance-testing/run-history.json'));
h.runs.push({date: new Date().toISOString().split('T')[0], scenario: 'load', p95_ms: P95, throughput_rps: RPS, error_rate_pct: ERR});
fs.writeFileSync('.claude/skills/performance-testing/run-history.json', JSON.stringify(h, null, 2));
"
```
Read `run-history.json` before each run — compare with baselines. Alert if p95 increases >20% from baseline.

## Gotchas

- k6 scripts generated by agent often hardcode base URLs — use environment variables for portability
- Load tests in containers hit resource limits before app limits — ensure container has 2x the resources of target
- Agent forgets to include think time between requests — without it, load is unrealistically bursty
- P95 vs P99 matters — agent defaults to averages which hide tail latency problems
- Baseline comparison requires consistent environment — CI runner variance can cause 20%+ noise

Related Skills

qe-visual-testing-advanced

298

from proffesor-for-testing/agentic-qe

Advanced visual regression testing with pixel-perfect comparison, AI-powered diff analysis, responsive design validation, and cross-browser visual consistency. Use when detecting UI regressions, validating designs, or ensuring visual consistency.

qe-shift-right-testing

298

from proffesor-for-testing/agentic-qe

Testing in production with feature flags, canary deployments, synthetic monitoring, and chaos engineering. Use when implementing production observability or progressive delivery.

qe-shift-left-testing

298

from proffesor-for-testing/agentic-qe

Move testing activities earlier in the development lifecycle to catch defects when they're cheapest to fix. Use when implementing TDD, CI/CD, or early quality practices.

qe-security-visual-testing

298

from proffesor-for-testing/agentic-qe

Security-first visual testing combining URL validation, PII detection, and visual regression with parallel viewport support. Use when testing web applications that handle sensitive data, need visual regression coverage, or require WCAG accessibility compliance.

qe-security-testing

298

from proffesor-for-testing/agentic-qe

Test for security vulnerabilities using OWASP principles. Use when conducting security audits, testing auth, or implementing security practices.

qe-risk-based-testing

298

from proffesor-for-testing/agentic-qe

Focus testing effort on highest-risk areas using risk assessment and prioritization. Use when planning test strategy, allocating testing resources, or making coverage decisions.

qe-regression-testing

298

from proffesor-for-testing/agentic-qe

Strategic regression testing with test selection, impact analysis, and continuous regression management. Use when verifying fixes don't break existing functionality, planning regression suites, or optimizing test execution for faster feedback.

qe-performance-testing

298

from proffesor-for-testing/agentic-qe

Test application performance, scalability, and resilience. Use when planning load testing, stress testing, or optimizing system performance.

qe-performance-analysis

298

from proffesor-for-testing/agentic-qe

Comprehensive performance analysis, bottleneck detection, and optimization recommendations for Claude Flow swarms

qe-observability-testing-patterns

298

from proffesor-for-testing/agentic-qe

Observability and monitoring validation patterns for dashboards, alerting, log aggregation, APM traces, and SLA/SLO verification. Use when testing monitoring infrastructure, dashboard accuracy, alert rules, or metric pipelines.

qe-n8n-workflow-testing-fundamentals

298

from proffesor-for-testing/agentic-qe

Comprehensive n8n workflow testing including execution lifecycle, node connection patterns, data flow validation, and error handling strategies. Use when testing n8n workflow automation applications.

qe-n8n-trigger-testing-strategies

298

from proffesor-for-testing/agentic-qe

Webhook testing, schedule validation, event-driven triggers, and polling mechanism testing for n8n workflows. Use when testing how workflows are triggered.