perf

Performance profiling, benchmarking, regression detection, and optimization. Triggers: "perf", "performance", "benchmark", "profile", "slow", "optimize", "latency", "throughput", "memory leak", "perf regression".

244 stars

byboshu2

View on GitHub Installation ↓

Best use case

perf is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using perf should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/perf/SKILL.md --create-dirs "https://raw.githubusercontent.com/boshu2/agentops/main/skills-codex/perf/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/perf/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How perf Compares

Feature / Agent	perf	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Perf Skill

> Quick Ref: `$perf profile <target>` | `$perf bench <target>` | `$perf compare <baseline> <candidate>` | `$perf optimize <target>`

**YOU MUST EXECUTE THIS WORKFLOW. Do not just describe it.**

Performance profiling, benchmarking, regression detection, and optimization recommendations for any language runtime. Produces actionable metrics, not vague advice.

## Modes

| Mode | Command | Purpose |
|------|---------|---------|
| **Profile** | `$perf profile <target>` | Profile execution, find hotspots |
| **Benchmark** | `$perf bench <target>` | Create or run benchmarks |
| **Compare** | `$perf compare <baseline> <candidate>` | Compare two runs for regression |
| **Optimize** | `$perf optimize <target>` | Analyze and apply optimizations |

If no mode is specified, default to **profile**.

---

## Step 0: Detect Language and Tooling

Identify the language/runtime from file extensions, `go.mod`, `package.json`, `pyproject.toml`, `Cargo.toml`, or explicit user input. Select the profiling stack:

| Language | Benchmarking | CPU Profile | Memory Profile | Comparison |
|----------|-------------|-------------|----------------|------------|
| **Go** | `go test -bench` | `go tool pprof` (cpu) | `go tool pprof` (alloc) | `benchstat` |
| **Python** | `pytest-benchmark`, `timeit` | `cProfile`, `py-spy` | `memory_profiler`, `tracemalloc` | manual diff |
| **Node** | `benchmark.js`, `vitest bench` | `--prof`, `clinic.js` | `--heap-prof`, `0x` | manual diff |
| **Rust** | `criterion`, `cargo bench` | `cargo flamegraph` | `heaptrack`, `DHAT` | `critcmp` |
| **Shell** | `hyperfine` | `time`, `strace` | N/A | `hyperfine` built-in |

Check which tools are actually installed. If a preferred tool is missing, fall back to standard-library alternatives before asking the user to install anything.

---

## Step 1: Establish Baseline

Run existing benchmarks first. If none exist, create them.

### 1a. Find Existing Benchmarks

```bash
# Go
grep -r "func Benchmark" --include="*_test.go" -l .

# Python
find . -name "test_*" -exec grep -l "benchmark\|@pytest.mark.benchmark" {} +

# Rust
grep -r "#\[bench\]" --include="*.rs" -l .

# Node
find . -name "*.bench.*" -o -name "*.benchmark.*"
```

### 1b. Run or Create Benchmarks

If benchmarks exist for the target, run them and capture output. If none exist, write benchmarks covering the target function or module.

**Benchmark requirements:**
- Measure wall-clock time (ops/sec or ns/op)
- Measure memory allocations (bytes/op, allocs/op)
- Run enough iterations for statistical stability (Go: `-benchtime=3s -count=5`)
- Record latency percentiles where applicable: p50, p95, p99

Save raw baseline output to `.agents/perf/baseline-YYYY-MM-DD.txt`.

### 1c. Go Benchmark Template

```go
func BenchmarkTargetFunction(b *testing.B) {
    // Setup outside the loop
    input := prepareInput()
    b.ResetTimer()
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        TargetFunction(input)
    }
}
```

### 1d. Python Benchmark Template

```python
import pytest

@pytest.mark.benchmark(group="target")
def test_target_benchmark(benchmark):
    input_data = prepare_input()
    result = benchmark(target_function, input_data)
    assert result is not None
```

---

## Step 2: Profile and Identify Hotspots

### CPU Profiling

Find functions consuming the most CPU time.

**Go:**
```bash
go test -bench=BenchmarkTarget -cpuprofile=cpu.prof ./...
go tool pprof -top cpu.prof
go tool pprof -text -cum cpu.prof   # cumulative view
```

**Python:**
```bash
python -m cProfile -s cumulative target_script.py
# Or for running processes:
py-spy top --pid <PID>
py-spy record -o profile.svg --pid <PID>
```

### Memory Profiling

Find allocation hotspots and potential leaks.

**Go:**
```bash
go test -bench=BenchmarkTarget -memprofile=mem.prof ./...
go tool pprof -top -alloc_space mem.prof
```

**Python:**
```bash
python -m memory_profiler target_script.py
# Or with tracemalloc in code:
# tracemalloc.start(); ...; snapshot = tracemalloc.take_snapshot()
```

### I/O Profiling

Identify blocking operations in hot paths.

- Check for synchronous file I/O, network calls, or database queries inside loops.
- Look for missing connection pooling, unbuffered writes, or serial HTTP calls that could be concurrent.

### Hotspot Summary

After profiling, produce a ranked list:

```
HOTSPOTS (by cumulative CPU time):
1. pkg/engine.Process       42.3%  (1.2s)   — main processing loop
2. pkg/engine.parseRecord   28.1%  (0.8s)   — record deserialization
3. pkg/io.ReadBatch         15.7%  (0.45s)  — disk reads
```

---

## Step 3: Analyze and Recommend

Classify each finding by estimated impact:

| Impact | Criteria | Action |
|--------|----------|--------|
| **High** | >20% of total time or >50% of allocations | Fix immediately |
| **Medium** | 5-20% of total time or notable allocation waste | Fix in this session |
| **Low** | <5% of total time, minor inefficiency | Log for later |

### Common Anti-Patterns

Check the profiled code against these known performance killers:

1. **Unnecessary allocations** — allocating inside hot loops, string concatenation in loops (use `strings.Builder` / `[]byte` / `io.StringWriter`)
2. **N+1 queries** — database call per item instead of batch query
3. **Missing caching** — recomputing expensive results that rarely change
4. **Blocking I/O in hot path** — synchronous network/disk calls where async or buffered I/O would work
5. **Excessive copying** — passing large structs by value, copying slices instead of slicing
6. **Suboptimal data structures** — linear search where a map lookup works, unbounded slice growth without pre-allocation
7. **Lock contention** — mutex held across I/O or long computation
8. **Regex compilation in loops** — compile once, reuse the compiled pattern
9. **Reflection in hot paths** — replace with code generation or type switches
10. **Unbuffered channels** — causing goroutine scheduling overhead in Go

For each finding, state:
- **What**: the specific code location and pattern
- **Why**: how it hurts performance (with numbers from profiling)
- **Fix**: concrete code change recommendation

---

## Step 4: Optimize (optimize mode only)

**Critical rule: ONE optimization at a time.**

For each optimization:

1. **Describe** the change before making it
2. **Apply** the single change
3. **Re-run** the benchmark suite
4. **Compare** results against baseline using `benchstat` (Go) or manual diff
5. **Keep or revert** — only keep changes that measurably improve metrics
6. **Commit** with message format: `perf(<scope>): <description> (+X% throughput)` or `perf(<scope>): <description> (-X% latency)`

### Acceptance Criteria

- Improvement must be statistically significant (p < 0.05 for `benchstat`, or >5% consistent change for manual comparison)
- No correctness regressions — all existing tests must still pass
- No readability destruction for marginal gains (<2% improvement does not justify obfuscated code)

### Optimization Order

Apply optimizations in this order (highest expected impact first):

1. Algorithmic improvements (O(n^2) to O(n log n), etc.)
2. Allocation reduction (pre-allocate, pool, reuse buffers)
3. I/O batching and buffering
4. Caching and memoization
5. Concurrency improvements (parallelize independent work)
6. Micro-optimizations (only if profiling confirms they matter)

---

## Step 5: Output Report

Write the report to `.agents/perf/YYYY-MM-DD-perf-<target>.md`.

### Report Template

```markdown
# Performance Report: <target>
Date: YYYY-MM-DD
Mode: <profile|bench|compare|optimize>
Language: <detected>

## Summary
<1-2 sentence summary of findings>

## Baseline Metrics
| Metric | Value |
|--------|-------|
| ops/sec | ... |
| ns/op | ... |
| B/op | ... |
| allocs/op | ... |
| p50 latency | ... |
| p95 latency | ... |
| p99 latency | ... |

## Hotspots
<ranked list from Step 2>

## Findings
<classified findings from Step 3>

## Optimizations Applied (if optimize mode)
| Change | Before | After | Improvement |
|--------|--------|-------|-------------|
| ... | ... | ... | +X% |

## After Metrics (if optimize mode)
<same table as baseline, with new values>

## Recommendations
<remaining opportunities not addressed in this session>
```

---

## Compare Mode Details

When running `$perf compare <baseline> <candidate>`:

1. Locate or re-run benchmarks for both versions
2. Use language-native comparison tools:
   - **Go**: `benchstat baseline.txt candidate.txt`
   - **Rust**: `critcmp baseline candidate`
   - **Other**: side-by-side table with percentage deltas
3. Flag regressions (>5% slower or >10% more allocations) as **REGRESSION**
4. Flag improvements (>5% faster or >10% fewer allocations) as **IMPROVEMENT**
5. Flag statistically insignificant changes as **NOISE**

Output a summary table:

```
COMPARISON: baseline vs candidate
| Benchmark | Baseline | Candidate | Delta | Verdict |
|-----------|----------|-----------|-------|---------|
| BenchmarkProcess | 1.2ms | 0.9ms | -25% | IMPROVEMENT |
| BenchmarkParse | 450ns | 480ns | +6.7% | REGRESSION |
| BenchmarkIO | 3.1ms | 3.0ms | -3.2% | NOISE |
```

---

## Edge Cases

- **No benchmarks and no clear target**: Run `$complexity` first to identify hot paths, then benchmark those.
- **Flaky benchmarks**: Increase iteration count, pin to a single core (`GOMAXPROCS=1`), close competing processes.
- **Cannot install profiling tools**: Fall back to `time` for wall-clock and manual instrumentation for allocation counts.
- **Target is a CLI command**: Use `hyperfine` for wall-clock benchmarking across any language.

## See Also

- [complexity](../complexity/SKILL.md) — Find high-complexity code to target
- [standards](../standards/SKILL.md) — Language-specific optimization patterns
- [vibe](../vibe/SKILL.md) — Validate optimized code quality

Related Skills

vibe

244

from boshu2/agentops

Comprehensive code validation. Runs complexity analysis then multi-model council. Answer: Is this code ready to ship? Triggers: "vibe", "validate code", "check code", "review code", "code quality", "is this ready".

validation

244

from boshu2/agentops

Full validation phase orchestrator. Vibe + post-mortem + retro + forge. Reviews implementation quality, extracts learnings, feeds the knowledge flywheel. Triggers: "validation", "validate", "validate work", "review and learn", "validation phase", "post-implementation review".

update

244

from boshu2/agentops

Reinstall all AgentOps skills globally from the latest source. Triggers: "update skills", "reinstall skills", "sync skills".

trace

244

from boshu2/agentops

Trace design decisions and concepts through session history, handoffs, and git. Triggers: "trace decision", "how did we decide", "where did this come from", "design provenance", "decision history".

test

244

from boshu2/agentops

Test generation, coverage analysis, and TDD workflow. Triggers: "test", "generate tests", "test coverage", "write tests", "tdd", "add tests", "test strategy", "missing tests", "coverage gaps".

status

244

from boshu2/agentops

Single-screen dashboard showing current work, recent validations, flywheel health, and suggested next action. Triggers: "status", "dashboard", "what am I working on", "where was I".

standards

244

from boshu2/agentops

Language-specific coding standards and validation rules. Provides Python, Go, Rust, TypeScript, Shell, YAML, JSON, and Markdown standards. Auto-loaded by $vibe, $implement, $doc, $bug-hunt, $complexity based on file types.

shared

244

from boshu2/agentops

Shared reference documents for multi-agent skills (not directly invocable)

security

244

from boshu2/agentops

Continuous repository security scanning and release gating. Triggers: "security scan", "security audit", "pre-release security", "run scanners", "check vulnerabilities".

security-suite

244

from boshu2/agentops

Composable security suite for binary and prompt-surface assurance, static analysis, dynamic tracing, repo-native redteam scans, contract capture, baseline drift, and policy gating. Triggers: "binary security", "reverse engineer binary", "black-box binary test", "behavioral trace", "baseline diff", "prompt redteam", "security suite".

scenario

244

from boshu2/agentops

Author and manage holdout scenarios for behavioral validation. Scenarios are stored in .agents/holdout/ where implementing agents cannot see them. Triggers: "$scenario", "holdout", "behavioral scenario", "create scenario", "list scenarios".

scaffold

244

from boshu2/agentops

Project scaffolding, component generation, and boilerplate setup. Triggers: "scaffold", "new project", "init project", "create project", "generate component", "setup project", "starter", "boilerplate".