benchmark
Use this skill to measure performance baselines, detect regressions before/after PRs, and compare stack alternatives.
About this skill
The Benchmark skill empowers AI agents to conduct comprehensive performance analysis for web applications and code. It allows developers to establish performance baselines, proactively identify and prevent performance regressions before and after code changes (e.g., Pull Requests), and objectively evaluate the performance of different technology stacks or architectural alternatives. The skill operates in modes like "Page Performance" where it navigates target URLs and meticulously measures real browser metrics, including critical Core Web Vitals such as Largest Contentful Paint (LCP), Cumulative Layout Shift (CLS), and Interaction to Next Paint (INP). By providing clear targets and reporting on these metrics, the skill ensures that performance goals are met, user experience remains optimal, and any perceptions of "slowness" are objectively addressed with data-driven insights.
Best use case
Establishing comprehensive performance baselines for new or existing projects. Performing pre-PR and post-PR checks to measure the performance impact of code changes and detect regressions. Investigating and quantifying performance issues when users report slow application behavior. Validating that performance targets are met as part of a pre-launch checklist for new features or applications. Comparing the performance characteristics of different technology stacks, frameworks, or architectural decisions.
Use this skill to measure performance baselines, detect regressions before/after PRs, and compare stack alternatives.
A detailed performance report that includes key metrics like Core Web Vitals (LCP, CLS, INP) for specified web pages, clear identification of performance regressions or improvements before/after code changes, objective validation against predefined performance targets, and comparative data for different system alternatives. The outcome provides actionable insights to optimize application speed and responsiveness.
Practical example
Example input
/benchmark page-performance urls=https://example.com,https://another.com target_lcp=2.5s target_cls=0.1 target_inp=200ms
Example output
```
Performance Benchmark Report:
Mode: Page Performance
Target URLs:
- https://example.com
- https://another.com
--- URL: https://example.com ---
Core Web Vitals:
- LCP (Largest Contentful Paint): 1.8s (Target: < 2.5s) - PASS
- CLS (Cumulative Layout Shift): 0.05 (Target: < 0.1) - PASS
- INP (Interaction to Next Paint): 150ms (Target: < 200ms) - PASS
Overall Status: Excellent
--- URL: https://another.com ---
Core Web Vitals:
- LCP (Largest Contentful Paint): 3.1s (Target: < 2.5s) - FAIL (Regression Detected)
- CLS (Cumulative Layout Shift): 0.08 (Target: < 0.1) - PASS
- INP (Interaction to Next Paint): 220ms (Target: < 200ms) - FAIL (Regression Detected)
Overall Status: Needs Attention
Summary: Performance targets met for https://example.com. Significant regressions detected for https://another.com in LCP and INP.
```When to use this skill
- Before merging a Pull Request, to ensure that new code does not degrade application performance.
- After a code deployment, to confirm performance improvements or to quickly detect unexpected regressions.
- When initiating a new project or major feature, to define and establish initial performance benchmarks.
- In response to user feedback indicating slow application response times or a poor user experience.
When not to use this skill
- When the primary task is unrelated to performance measurement, such as generating code, debugging functional errors, or designing UI/UX.
- For conducting security audits, vulnerability scanning, or compliance checks.
- If the goal is purely to analyze business data or perform complex data transformations (unless the performance of these operations is being benchmarked).
- When highly granular, code-level profiling of specific functions is needed, beyond what typical web vitals or high-level code execution benchmarks provide.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/benchmark/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How benchmark Compares
| Feature / Agent | benchmark | Standard Approach |
|---|---|---|
| Platform Support | Claude | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | easy | N/A |
Frequently Asked Questions
What does this skill do?
Use this skill to measure performance baselines, detect regressions before/after PRs, and compare stack alternatives.
Which AI agents support this skill?
This skill is designed for Claude.
How difficult is it to install?
The installation complexity is rated as easy. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
SKILL.md Source
# Benchmark — Performance Baseline & Regression Detection ## When to Use - Before and after a PR to measure performance impact - Setting up performance baselines for a project - When users report "it feels slow" - Before a launch — ensure you meet performance targets - Comparing your stack against alternatives ## How It Works ### Mode 1: Page Performance Measures real browser metrics via browser MCP: ``` 1. Navigate to each target URL 2. Measure Core Web Vitals: - LCP (Largest Contentful Paint) — target < 2.5s - CLS (Cumulative Layout Shift) — target < 0.1 - INP (Interaction to Next Paint) — target < 200ms - FCP (First Contentful Paint) — target < 1.8s - TTFB (Time to First Byte) — target < 800ms 3. Measure resource sizes: - Total page weight (target < 1MB) - JS bundle size (target < 200KB gzipped) - CSS size - Image weight - Third-party script weight 4. Count network requests 5. Check for render-blocking resources ``` ### Mode 2: API Performance Benchmarks API endpoints: ``` 1. Hit each endpoint 100 times 2. Measure: p50, p95, p99 latency 3. Track: response size, status codes 4. Test under load: 10 concurrent requests 5. Compare against SLA targets ``` ### Mode 3: Build Performance Measures development feedback loop: ``` 1. Cold build time 2. Hot reload time (HMR) 3. Test suite duration 4. TypeScript check time 5. Lint time 6. Docker build time ``` ### Mode 4: Before/After Comparison Run before and after a change to measure impact: ``` /benchmark baseline # saves current metrics # ... make changes ... /benchmark compare # compares against baseline ``` Output: ``` | Metric | Before | After | Delta | Verdict | |--------|--------|-------|-------|---------| | LCP | 1.2s | 1.4s | +200ms | WARNING: WARN | | Bundle | 180KB | 175KB | -5KB | ✓ BETTER | | Build | 12s | 14s | +2s | WARNING: WARN | ``` ## Output Stores baselines in `.ecc/benchmarks/` as JSON. Git-tracked so the team shares baselines. ## Integration - CI: run `/benchmark compare` on every PR - Pair with `/canary-watch` for post-deploy monitoring - Pair with `/browser-qa` for full pre-ship checklist
Related Skills
workspace-surface-audit
Audit the active repo, MCP servers, plugins, connectors, env surfaces, and harness setup, then recommend the highest-value ECC-native skills, hooks, agents, and operator workflows. Use when the user wants help setting up Claude Code or understanding what capabilities are actually available in their environment.
safety-guard
Use this skill to prevent destructive operations when working on production systems or running agents autonomously.
repo-scan
Cross-stack source code asset audit — classifies every file, detects embedded third-party libraries, and delivers actionable four-level verdicts per module with interactive HTML reports.
project-flow-ops
Operate execution flow across GitHub and Linear by triaging issues and pull requests, linking active work, and keeping GitHub public-facing while Linear remains the internal execution layer. Use when the user wants backlog control, PR triage, or GitHub-to-Linear coordination.
manim-video
Build reusable Manim explainers for technical concepts, graphs, system diagrams, and product walkthroughs, then hand off to the wider ECC video stack if needed. Use when the user wants a clean animated explainer rather than a generic talking-head script.
laravel-plugin-discovery
Discover and evaluate Laravel packages via LaraPlugins.io MCP. Use when the user wants to find plugins, check package health, or assess Laravel/PHP compatibility.
design-system
Use this skill to generate or audit design systems, check visual consistency, and review PRs that touch styling.
click-path-audit
Trace every user-facing button/touchpoint through its full state change sequence to find bugs where functions individually work but cancel each other out, produce wrong final state, or leave the UI in an inconsistent state. Use when: systematic debugging found no bugs but users report broken buttons, or after any major refactor touching shared state stores.
ck
Persistent per-project memory for Claude Code. Auto-loads project context on session start, tracks sessions with git activity, and writes to native memory. Commands run deterministic Node.js scripts — behavior is consistent across model versions.
canary-watch
Use this skill to monitor a deployed URL for regressions after deploys, merges, or dependency upgrades.
swiftui-patterns
SwiftUI 架构模式,使用 @Observable 进行状态管理,视图组合,导航,性能优化,以及现代 iOS/macOS UI 最佳实践。
swift-protocol-di-testing
基于协议的依赖注入,用于可测试的Swift代码——使用聚焦协议和Swift Testing模拟文件系统、网络和外部API。