AI Agent Skill HUB

ClaudeDevelopment

benchmark

Use this skill to measure performance baselines, detect regressions before/after PRs, and compare stack alternatives.

144,923 stars

Complexity: easy

View on GitHub Installation ↓

About this skill

The Benchmark skill empowers AI agents to conduct comprehensive performance analysis for web applications and code. It allows developers to establish performance baselines, proactively identify and prevent performance regressions before and after code changes (e.g., Pull Requests), and objectively evaluate the performance of different technology stacks or architectural alternatives. The skill operates in modes like "Page Performance" where it navigates target URLs and meticulously measures real browser metrics, including critical Core Web Vitals such as Largest Contentful Paint (LCP), Cumulative Layout Shift (CLS), and Interaction to Next Paint (INP). By providing clear targets and reporting on these metrics, the skill ensures that performance goals are met, user experience remains optimal, and any perceptions of "slowness" are objectively addressed with data-driven insights.

Best use case

Establishing comprehensive performance baselines for new or existing projects. Performing pre-PR and post-PR checks to measure the performance impact of code changes and detect regressions. Investigating and quantifying performance issues when users report slow application behavior. Validating that performance targets are met as part of a pre-launch checklist for new features or applications. Comparing the performance characteristics of different technology stacks, frameworks, or architectural decisions.

Use this skill to measure performance baselines, detect regressions before/after PRs, and compare stack alternatives.

A detailed performance report that includes key metrics like Core Web Vitals (LCP, CLS, INP) for specified web pages, clear identification of performance regressions or improvements before/after code changes, objective validation against predefined performance targets, and comparative data for different system alternatives. The outcome provides actionable insights to optimize application speed and responsiveness.

Practical example

Example input

/benchmark page-performance urls=https://example.com,https://another.com target_lcp=2.5s target_cls=0.1 target_inp=200ms

Example output

```
Performance Benchmark Report:

Mode: Page Performance
Target URLs:
  - https://example.com
  - https://another.com

--- URL: https://example.com ---
  Core Web Vitals:
    - LCP (Largest Contentful Paint): 1.8s (Target: < 2.5s) - PASS
    - CLS (Cumulative Layout Shift): 0.05 (Target: < 0.1) - PASS
    - INP (Interaction to Next Paint): 150ms (Target: < 200ms) - PASS
  Overall Status: Excellent

--- URL: https://another.com ---
  Core Web Vitals:
    - LCP (Largest Contentful Paint): 3.1s (Target: < 2.5s) - FAIL (Regression Detected)
    - CLS (Cumulative Layout Shift): 0.08 (Target: < 0.1) - PASS
    - INP (Interaction to Next Paint): 220ms (Target: < 200ms) - FAIL (Regression Detected)
  Overall Status: Needs Attention

Summary: Performance targets met for https://example.com. Significant regressions detected for https://another.com in LCP and INP.
```

When to use this skill

Before merging a Pull Request, to ensure that new code does not degrade application performance.
After a code deployment, to confirm performance improvements or to quickly detect unexpected regressions.
When initiating a new project or major feature, to define and establish initial performance benchmarks.
In response to user feedback indicating slow application response times or a poor user experience.

When not to use this skill

When the primary task is unrelated to performance measurement, such as generating code, debugging functional errors, or designing UI/UX.
For conducting security audits, vulnerability scanning, or compliance checks.
If the goal is purely to analyze business data or perform complex data transformations (unless the performance of these operations is being benchmarked).
When highly granular, code-level profiling of specific functions is needed, beyond what typical web vitals or high-level code execution benchmarks provide.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/benchmark/SKILL.md --create-dirs "https://raw.githubusercontent.com/affaan-m/everything-claude-code/main/skills/benchmark/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/benchmark/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How benchmark Compares

Feature / Agent	benchmark	Standard Approach
Platform Support	Claude	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	easy	N/A

Frequently Asked Questions

What does this skill do?

Use this skill to measure performance baselines, detect regressions before/after PRs, and compare stack alternatives.

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# Benchmark — Performance Baseline & Regression Detection

## When to Use

- Before and after a PR to measure performance impact
- Setting up performance baselines for a project
- When users report "it feels slow"
- Before a launch — ensure you meet performance targets
- Comparing your stack against alternatives

## How It Works

### Mode 1: Page Performance

Measures real browser metrics via browser MCP:

```
1. Navigate to each target URL
2. Measure Core Web Vitals:
   - LCP (Largest Contentful Paint) — target < 2.5s
   - CLS (Cumulative Layout Shift) — target < 0.1
   - INP (Interaction to Next Paint) — target < 200ms
   - FCP (First Contentful Paint) — target < 1.8s
   - TTFB (Time to First Byte) — target < 800ms
3. Measure resource sizes:
   - Total page weight (target < 1MB)
   - JS bundle size (target < 200KB gzipped)
   - CSS size
   - Image weight
   - Third-party script weight
4. Count network requests
5. Check for render-blocking resources
```

### Mode 2: API Performance

Benchmarks API endpoints:

```
1. Hit each endpoint 100 times
2. Measure: p50, p95, p99 latency
3. Track: response size, status codes
4. Test under load: 10 concurrent requests
5. Compare against SLA targets
```

### Mode 3: Build Performance

Measures development feedback loop:

```
1. Cold build time
2. Hot reload time (HMR)
3. Test suite duration
4. TypeScript check time
5. Lint time
6. Docker build time
```

### Mode 4: Before/After Comparison

Run before and after a change to measure impact:

```
/benchmark baseline    # saves current metrics
# ... make changes ...
/benchmark compare     # compares against baseline
```

Output:
```
| Metric | Before | After | Delta | Verdict |
|--------|--------|-------|-------|---------|
| LCP | 1.2s | 1.4s | +200ms | WARNING: WARN |
| Bundle | 180KB | 175KB | -5KB | ✓ BETTER |
| Build | 12s | 14s | +2s | WARNING: WARN |
```

## Output

Stores baselines in `.ecc/benchmarks/` as JSON. Git-tracked so the team shares baselines.

## Integration

- CI: run `/benchmark compare` on every PR
- Pair with `/canary-watch` for post-deploy monitoring
- Pair with `/browser-qa` for full pre-ship checklist

Related Skills

workspace-surface-audit

from affaan-m/everything-claude-code

Audit the active repo, MCP servers, plugins, connectors, env surfaces, and harness setup, then recommend the highest-value ECC-native skills, hooks, agents, and operator workflows. Use when the user wants help setting up Claude Code or understanding what capabilities are actually available in their environment.

DevelopmentClaude

safety-guard

from affaan-m/everything-claude-code

Use this skill to prevent destructive operations when working on production systems or running agents autonomously.

DevelopmentClaude

repo-scan

from affaan-m/everything-claude-code

Cross-stack source code asset audit — classifies every file, detects embedded third-party libraries, and delivers actionable four-level verdicts per module with interactive HTML reports.

DevelopmentClaude

project-flow-ops

from affaan-m/everything-claude-code

Operate execution flow across GitHub and Linear by triaging issues and pull requests, linking active work, and keeping GitHub public-facing while Linear remains the internal execution layer. Use when the user wants backlog control, PR triage, or GitHub-to-Linear coordination.

DevelopmentClaude

manim-video

from affaan-m/everything-claude-code

Build reusable Manim explainers for technical concepts, graphs, system diagrams, and product walkthroughs, then hand off to the wider ECC video stack if needed. Use when the user wants a clean animated explainer rather than a generic talking-head script.

DevelopmentClaude

laravel-plugin-discovery

from affaan-m/everything-claude-code

Discover and evaluate Laravel packages via LaraPlugins.io MCP. Use when the user wants to find plugins, check package health, or assess Laravel/PHP compatibility.

DevelopmentClaude

design-system

from affaan-m/everything-claude-code

Use this skill to generate or audit design systems, check visual consistency, and review PRs that touch styling.

DevelopmentClaude

click-path-audit

from affaan-m/everything-claude-code

Trace every user-facing button/touchpoint through its full state change sequence to find bugs where functions individually work but cancel each other out, produce wrong final state, or leave the UI in an inconsistent state. Use when: systematic debugging found no bugs but users report broken buttons, or after any major refactor touching shared state stores.

DevelopmentClaude

ck

from affaan-m/everything-claude-code

Persistent per-project memory for Claude Code. Auto-loads project context on session start, tracks sessions with git activity, and writes to native memory. Commands run deterministic Node.js scripts — behavior is consistent across model versions.

DevelopmentClaude

canary-watch

from affaan-m/everything-claude-code

Use this skill to monitor a deployed URL for regressions after deploys, merges, or dependency upgrades.

DevelopmentClaude

swiftui-patterns

from affaan-m/everything-claude-code

SwiftUI 架构模式，使用 @Observable 进行状态管理，视图组合，导航，性能优化，以及现代 iOS/macOS UI 最佳实践。

DevelopmentClaude

swift-protocol-di-testing

from affaan-m/everything-claude-code

基于协议的依赖注入，用于可测试的Swift代码——使用聚焦协议和Swift Testing模拟文件系统、网络和外部API。

DevelopmentClaude