perf
Performance profiling, benchmarking, regression detection, and optimization. Triggers: "perf", "performance", "benchmark", "profile", "slow", "optimize", "latency", "throughput", "memory leak", "perf regression".
Best use case
perf is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Performance profiling, benchmarking, regression detection, and optimization. Triggers: "perf", "performance", "benchmark", "profile", "slow", "optimize", "latency", "throughput", "memory leak", "perf regression".
Teams using perf should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/perf/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How perf Compares
| Feature / Agent | perf | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Performance profiling, benchmarking, regression detection, and optimization. Triggers: "perf", "performance", "benchmark", "profile", "slow", "optimize", "latency", "throughput", "memory leak", "perf regression".
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Perf Skill
> Quick Ref: `$perf profile <target>` | `$perf bench <target>` | `$perf compare <baseline> <candidate>` | `$perf optimize <target>`
**YOU MUST EXECUTE THIS WORKFLOW. Do not just describe it.**
Performance profiling, benchmarking, regression detection, and optimization recommendations for any language runtime. Produces actionable metrics, not vague advice.
## Modes
| Mode | Command | Purpose |
|------|---------|---------|
| **Profile** | `$perf profile <target>` | Profile execution, find hotspots |
| **Benchmark** | `$perf bench <target>` | Create or run benchmarks |
| **Compare** | `$perf compare <baseline> <candidate>` | Compare two runs for regression |
| **Optimize** | `$perf optimize <target>` | Analyze and apply optimizations |
If no mode is specified, default to **profile**.
---
## Step 0: Detect Language and Tooling
Identify the language/runtime from file extensions, `go.mod`, `package.json`, `pyproject.toml`, `Cargo.toml`, or explicit user input. Select the profiling stack:
| Language | Benchmarking | CPU Profile | Memory Profile | Comparison |
|----------|-------------|-------------|----------------|------------|
| **Go** | `go test -bench` | `go tool pprof` (cpu) | `go tool pprof` (alloc) | `benchstat` |
| **Python** | `pytest-benchmark`, `timeit` | `cProfile`, `py-spy` | `memory_profiler`, `tracemalloc` | manual diff |
| **Node** | `benchmark.js`, `vitest bench` | `--prof`, `clinic.js` | `--heap-prof`, `0x` | manual diff |
| **Rust** | `criterion`, `cargo bench` | `cargo flamegraph` | `heaptrack`, `DHAT` | `critcmp` |
| **Shell** | `hyperfine` | `time`, `strace` | N/A | `hyperfine` built-in |
Check which tools are actually installed. If a preferred tool is missing, fall back to standard-library alternatives before asking the user to install anything.
---
## Step 1: Establish Baseline
Run existing benchmarks first. If none exist, create them.
### 1a. Find Existing Benchmarks
```bash
# Go
grep -r "func Benchmark" --include="*_test.go" -l .
# Python
find . -name "test_*" -exec grep -l "benchmark\|@pytest.mark.benchmark" {} +
# Rust
grep -r "#\[bench\]" --include="*.rs" -l .
# Node
find . -name "*.bench.*" -o -name "*.benchmark.*"
```
### 1b. Run or Create Benchmarks
If benchmarks exist for the target, run them and capture output. If none exist, write benchmarks covering the target function or module.
**Benchmark requirements:**
- Measure wall-clock time (ops/sec or ns/op)
- Measure memory allocations (bytes/op, allocs/op)
- Run enough iterations for statistical stability (Go: `-benchtime=3s -count=5`)
- Record latency percentiles where applicable: p50, p95, p99
Save raw baseline output to `.agents/perf/baseline-YYYY-MM-DD.txt`.
### 1c. Go Benchmark Template
```go
func BenchmarkTargetFunction(b *testing.B) {
// Setup outside the loop
input := prepareInput()
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
TargetFunction(input)
}
}
```
### 1d. Python Benchmark Template
```python
import pytest
@pytest.mark.benchmark(group="target")
def test_target_benchmark(benchmark):
input_data = prepare_input()
result = benchmark(target_function, input_data)
assert result is not None
```
---
## Step 2: Profile and Identify Hotspots
### CPU Profiling
Find functions consuming the most CPU time.
**Go:**
```bash
go test -bench=BenchmarkTarget -cpuprofile=cpu.prof ./...
go tool pprof -top cpu.prof
go tool pprof -text -cum cpu.prof # cumulative view
```
**Python:**
```bash
python -m cProfile -s cumulative target_script.py
# Or for running processes:
py-spy top --pid <PID>
py-spy record -o profile.svg --pid <PID>
```
### Memory Profiling
Find allocation hotspots and potential leaks.
**Go:**
```bash
go test -bench=BenchmarkTarget -memprofile=mem.prof ./...
go tool pprof -top -alloc_space mem.prof
```
**Python:**
```bash
python -m memory_profiler target_script.py
# Or with tracemalloc in code:
# tracemalloc.start(); ...; snapshot = tracemalloc.take_snapshot()
```
### I/O Profiling
Identify blocking operations in hot paths.
- Check for synchronous file I/O, network calls, or database queries inside loops.
- Look for missing connection pooling, unbuffered writes, or serial HTTP calls that could be concurrent.
### Hotspot Summary
After profiling, produce a ranked list:
```
HOTSPOTS (by cumulative CPU time):
1. pkg/engine.Process 42.3% (1.2s) — main processing loop
2. pkg/engine.parseRecord 28.1% (0.8s) — record deserialization
3. pkg/io.ReadBatch 15.7% (0.45s) — disk reads
```
---
## Step 3: Analyze and Recommend
Classify each finding by estimated impact:
| Impact | Criteria | Action |
|--------|----------|--------|
| **High** | >20% of total time or >50% of allocations | Fix immediately |
| **Medium** | 5-20% of total time or notable allocation waste | Fix in this session |
| **Low** | <5% of total time, minor inefficiency | Log for later |
### Common Anti-Patterns
Check the profiled code against these known performance killers:
1. **Unnecessary allocations** — allocating inside hot loops, string concatenation in loops (use `strings.Builder` / `[]byte` / `io.StringWriter`)
2. **N+1 queries** — database call per item instead of batch query
3. **Missing caching** — recomputing expensive results that rarely change
4. **Blocking I/O in hot path** — synchronous network/disk calls where async or buffered I/O would work
5. **Excessive copying** — passing large structs by value, copying slices instead of slicing
6. **Suboptimal data structures** — linear search where a map lookup works, unbounded slice growth without pre-allocation
7. **Lock contention** — mutex held across I/O or long computation
8. **Regex compilation in loops** — compile once, reuse the compiled pattern
9. **Reflection in hot paths** — replace with code generation or type switches
10. **Unbuffered channels** — causing goroutine scheduling overhead in Go
For each finding, state:
- **What**: the specific code location and pattern
- **Why**: how it hurts performance (with numbers from profiling)
- **Fix**: concrete code change recommendation
---
## Step 4: Optimize (optimize mode only)
**Critical rule: ONE optimization at a time.**
For each optimization:
1. **Describe** the change before making it
2. **Apply** the single change
3. **Re-run** the benchmark suite
4. **Compare** results against baseline using `benchstat` (Go) or manual diff
5. **Keep or revert** — only keep changes that measurably improve metrics
6. **Commit** with message format: `perf(<scope>): <description> (+X% throughput)` or `perf(<scope>): <description> (-X% latency)`
### Acceptance Criteria
- Improvement must be statistically significant (p < 0.05 for `benchstat`, or >5% consistent change for manual comparison)
- No correctness regressions — all existing tests must still pass
- No readability destruction for marginal gains (<2% improvement does not justify obfuscated code)
### Optimization Order
Apply optimizations in this order (highest expected impact first):
1. Algorithmic improvements (O(n^2) to O(n log n), etc.)
2. Allocation reduction (pre-allocate, pool, reuse buffers)
3. I/O batching and buffering
4. Caching and memoization
5. Concurrency improvements (parallelize independent work)
6. Micro-optimizations (only if profiling confirms they matter)
---
## Step 5: Output Report
Write the report to `.agents/perf/YYYY-MM-DD-perf-<target>.md`.
### Report Template
```markdown
# Performance Report: <target>
Date: YYYY-MM-DD
Mode: <profile|bench|compare|optimize>
Language: <detected>
## Summary
<1-2 sentence summary of findings>
## Baseline Metrics
| Metric | Value |
|--------|-------|
| ops/sec | ... |
| ns/op | ... |
| B/op | ... |
| allocs/op | ... |
| p50 latency | ... |
| p95 latency | ... |
| p99 latency | ... |
## Hotspots
<ranked list from Step 2>
## Findings
<classified findings from Step 3>
## Optimizations Applied (if optimize mode)
| Change | Before | After | Improvement |
|--------|--------|-------|-------------|
| ... | ... | ... | +X% |
## After Metrics (if optimize mode)
<same table as baseline, with new values>
## Recommendations
<remaining opportunities not addressed in this session>
```
---
## Compare Mode Details
When running `$perf compare <baseline> <candidate>`:
1. Locate or re-run benchmarks for both versions
2. Use language-native comparison tools:
- **Go**: `benchstat baseline.txt candidate.txt`
- **Rust**: `critcmp baseline candidate`
- **Other**: side-by-side table with percentage deltas
3. Flag regressions (>5% slower or >10% more allocations) as **REGRESSION**
4. Flag improvements (>5% faster or >10% fewer allocations) as **IMPROVEMENT**
5. Flag statistically insignificant changes as **NOISE**
Output a summary table:
```
COMPARISON: baseline vs candidate
| Benchmark | Baseline | Candidate | Delta | Verdict |
|-----------|----------|-----------|-------|---------|
| BenchmarkProcess | 1.2ms | 0.9ms | -25% | IMPROVEMENT |
| BenchmarkParse | 450ns | 480ns | +6.7% | REGRESSION |
| BenchmarkIO | 3.1ms | 3.0ms | -3.2% | NOISE |
```
---
## Edge Cases
- **No benchmarks and no clear target**: Run `$complexity` first to identify hot paths, then benchmark those.
- **Flaky benchmarks**: Increase iteration count, pin to a single core (`GOMAXPROCS=1`), close competing processes.
- **Cannot install profiling tools**: Fall back to `time` for wall-clock and manual instrumentation for allocation counts.
- **Target is a CLI command**: Use `hyperfine` for wall-clock benchmarking across any language.
## See Also
- [complexity](../complexity/SKILL.md) — Find high-complexity code to target
- [standards](../standards/SKILL.md) — Language-specific optimization patterns
- [vibe](../vibe/SKILL.md) — Validate optimized code qualityRelated Skills
vibe
Comprehensive code validation. Runs complexity analysis then multi-model council. Answer: Is this code ready to ship? Triggers: "vibe", "validate code", "check code", "review code", "code quality", "is this ready".
validation
Full validation phase orchestrator. Vibe + post-mortem + retro + forge. Reviews implementation quality, extracts learnings, feeds the knowledge flywheel. Triggers: "validation", "validate", "validate work", "review and learn", "validation phase", "post-implementation review".
update
Reinstall all AgentOps skills globally from the latest source. Triggers: "update skills", "reinstall skills", "sync skills".
trace
Trace design decisions and concepts through session history, handoffs, and git. Triggers: "trace decision", "how did we decide", "where did this come from", "design provenance", "decision history".
test
Test generation, coverage analysis, and TDD workflow. Triggers: "test", "generate tests", "test coverage", "write tests", "tdd", "add tests", "test strategy", "missing tests", "coverage gaps".
status
Single-screen dashboard showing current work, recent validations, flywheel health, and suggested next action. Triggers: "status", "dashboard", "what am I working on", "where was I".
standards
Language-specific coding standards and validation rules. Provides Python, Go, Rust, TypeScript, Shell, YAML, JSON, and Markdown standards. Auto-loaded by $vibe, $implement, $doc, $bug-hunt, $complexity based on file types.
shared
Shared reference documents for multi-agent skills (not directly invocable)
security
Continuous repository security scanning and release gating. Triggers: "security scan", "security audit", "pre-release security", "run scanners", "check vulnerabilities".
security-suite
Composable security suite for binary and prompt-surface assurance, static analysis, dynamic tracing, repo-native redteam scans, contract capture, baseline drift, and policy gating. Triggers: "binary security", "reverse engineer binary", "black-box binary test", "behavioral trace", "baseline diff", "prompt redteam", "security suite".
scenario
Author and manage holdout scenarios for behavioral validation. Scenarios are stored in .agents/holdout/ where implementing agents cannot see them. Triggers: "$scenario", "holdout", "behavioral scenario", "create scenario", "list scenarios".
scaffold
Project scaffolding, component generation, and boilerplate setup. Triggers: "scaffold", "new project", "init project", "create project", "generate component", "setup project", "starter", "boilerplate".