canary-watch

Use this skill to monitor a deployed URL for regressions after deploys, merges, or dependency upgrades.

144,923 stars
Complexity: easy

About this skill

The `canary-watch` skill is a critical tool for maintaining application stability and performance after crucial changes. Designed for the Claude AI agent, it automatically monitors a specified deployed URL to detect regressions across multiple dimensions. This includes verifying HTTP status (e.g., 200 OK), identifying new console errors, detecting network failures (like failed API calls or 5xx responses), tracking performance metrics (LCP, CLS, INP) against a baseline, and ensuring critical content elements remain intact. It operates in a continuous loop for a defined watch window or until manually stopped, making it ideal for verifying fixes, continuous monitoring during launch phases, and safeguarding against unintended side effects from deploys, risky merges, or dependency upgrades. This proactive approach helps AI agents quickly identify and alert on issues before they impact end-users, ensuring a stable and high-quality user experience.

Best use case

Proactive detection of regressions in web applications or services after deployment, code merges, or infrastructure changes to ensure stability and performance.

Use this skill to monitor a deployed URL for regressions after deploys, merges, or dependency upgrades.

A detailed monitoring report indicating the health of the deployed URL. The report will highlight any detected regressions such as non-200 HTTP status codes, newly introduced console errors, network request failures, significant performance degradations (LCP, CLS, INP) compared to a baseline, or missing/altered critical content elements, enabling immediate action.

Practical example

Example input

To start monitoring `https://your-production-app.com` for 60 minutes:
`use_skill('canary-watch', url='https://your-production-app.com', duration_minutes=60)`

To compare against a baseline URL:
`use_skill('canary-watch', url='https://your-production-app.com', duration_minutes=30, baseline_url='https://your-staging-app.com')`

Example output

```json
{
  "status": "monitoring_complete",
  "url": "https://www.example.com",
  "duration_minutes": 60,
  "regressions_found": false,
  "summary": "No regressions detected across HTTP status, console errors, network failures, performance, or content.",
  "details": {
    "http_status": "200 OK (stable)",
    "console_errors": "No new errors detected.",
    "network_failures": "0 failures.",
    "performance": {
      "LCP_change": "+0ms (stable)",
      "CLS_change": "+0 (stable)",
      "INP_change": "+0ms (stable)"
    },
    "content_check": "Key elements present and stable."
  }
}
```

Or, if regressions are found:

```json
{
  "status": "monitoring_complete_with_issues",
  "url": "https://www.example.com",
  "duration_minutes": 60,
  "regressions_found": true,
  "summary": "Regressions detected in console errors and performance.",
  "details": {
    "http_status": "200 OK (stable)",
    "console_errors": "New error detected: 'Uncaught ReferenceError: foo is not defined' on line 12.",
    "network_failures": "1 failure: GET /api/data returned 500 Internal Server Error.",
    "performance": {
      "LCP_change": "+1500ms (significant degradation)",
      "CLS_change": "+0.15 (degradation)",
      "INP_change": "+50ms (stable)"
    },
    "content_check": "Missing element: 'Product Price' with ID 'product-price'."
  }
}
```

When to use this skill

  • After deploying to production or staging environments.
  • Following the merge of a high-risk Pull Request (PR).
  • To verify that a specific bug fix has effectively resolved the issue.
  • For continuous monitoring during a product launch window.

When not to use this skill

  • For non-web-based applications or services that don't expose a monitored URL.
  • During very early development stages where a stable deployed URL isn't available or expected.
  • When comprehensive, dedicated external monitoring systems are already in place and fully integrated.
  • If the primary goal is functional testing rather than regression monitoring of existing metrics.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/canary-watch/SKILL.md --create-dirs "https://raw.githubusercontent.com/affaan-m/everything-claude-code/main/skills/canary-watch/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/canary-watch/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How canary-watch Compares

Feature / Agentcanary-watchStandard Approach
Platform SupportClaudeLimited / Varies
Context Awareness High Baseline
Installation ComplexityeasyN/A

Frequently Asked Questions

What does this skill do?

Use this skill to monitor a deployed URL for regressions after deploys, merges, or dependency upgrades.

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Canary Watch — Post-Deploy Monitoring

## When to Use

- After deploying to production or staging
- After merging a risky PR
- When you want to verify a fix actually fixed it
- Continuous monitoring during a launch window
- After dependency upgrades

## How It Works

Monitors a deployed URL for regressions. Runs in a loop until stopped or until the watch window expires.

### What It Watches

```
1. HTTP Status — is the page returning 200?
2. Console Errors — new errors that weren't there before?
3. Network Failures — failed API calls, 5xx responses?
4. Performance — LCP/CLS/INP regression vs baseline?
5. Content — did key elements disappear? (h1, nav, footer, CTA)
6. API Health — are critical endpoints responding within SLA?
```

### Watch Modes

**Quick check** (default): single pass, report results
```
/canary-watch https://myapp.com
```

**Sustained watch**: check every N minutes for M hours
```
/canary-watch https://myapp.com --interval 5m --duration 2h
```

**Diff mode**: compare staging vs production
```
/canary-watch --compare https://staging.myapp.com https://myapp.com
```

### Alert Thresholds

```yaml
critical:  # immediate alert
  - HTTP status != 200
  - Console error count > 5 (new errors only)
  - LCP > 4s
  - API endpoint returns 5xx

warning:   # flag in report
  - LCP increased > 500ms from baseline
  - CLS > 0.1
  - New console warnings
  - Response time > 2x baseline

info:      # log only
  - Minor performance variance
  - New network requests (third-party scripts added?)
```

### Notifications

When a critical threshold is crossed:
- Desktop notification (macOS/Linux)
- Optional: Slack/Discord webhook
- Log to `~/.claude/canary-watch.log`

## Output

```markdown
## Canary Report — myapp.com — 2026-03-23 03:15 PST

### Status: HEALTHY ✓

| Check | Result | Baseline | Delta |
|-------|--------|----------|-------|
| HTTP | 200 ✓ | 200 | — |
| Console errors | 0 ✓ | 0 | — |
| LCP | 1.8s ✓ | 1.6s | +200ms |
| CLS | 0.01 ✓ | 0.01 | — |
| API /health | 145ms ✓ | 120ms | +25ms |

### No regressions detected. Deploy is clean.
```

## Integration

Pair with:
- `/browser-qa` for pre-deploy verification
- Hooks: add as a PostToolUse hook on `git push` to auto-check after deploys
- CI: run in GitHub Actions after deploy step

Related Skills

workspace-surface-audit

144923
from affaan-m/everything-claude-code

Audit the active repo, MCP servers, plugins, connectors, env surfaces, and harness setup, then recommend the highest-value ECC-native skills, hooks, agents, and operator workflows. Use when the user wants help setting up Claude Code or understanding what capabilities are actually available in their environment.

DevelopmentClaude

safety-guard

144923
from affaan-m/everything-claude-code

Use this skill to prevent destructive operations when working on production systems or running agents autonomously.

DevelopmentClaude

repo-scan

144923
from affaan-m/everything-claude-code

Cross-stack source code asset audit — classifies every file, detects embedded third-party libraries, and delivers actionable four-level verdicts per module with interactive HTML reports.

DevelopmentClaude

project-flow-ops

144923
from affaan-m/everything-claude-code

Operate execution flow across GitHub and Linear by triaging issues and pull requests, linking active work, and keeping GitHub public-facing while Linear remains the internal execution layer. Use when the user wants backlog control, PR triage, or GitHub-to-Linear coordination.

DevelopmentClaude

manim-video

144923
from affaan-m/everything-claude-code

Build reusable Manim explainers for technical concepts, graphs, system diagrams, and product walkthroughs, then hand off to the wider ECC video stack if needed. Use when the user wants a clean animated explainer rather than a generic talking-head script.

DevelopmentClaude

laravel-plugin-discovery

144923
from affaan-m/everything-claude-code

Discover and evaluate Laravel packages via LaraPlugins.io MCP. Use when the user wants to find plugins, check package health, or assess Laravel/PHP compatibility.

DevelopmentClaude

design-system

144923
from affaan-m/everything-claude-code

Use this skill to generate or audit design systems, check visual consistency, and review PRs that touch styling.

DevelopmentClaude

click-path-audit

144923
from affaan-m/everything-claude-code

Trace every user-facing button/touchpoint through its full state change sequence to find bugs where functions individually work but cancel each other out, produce wrong final state, or leave the UI in an inconsistent state. Use when: systematic debugging found no bugs but users report broken buttons, or after any major refactor touching shared state stores.

DevelopmentClaude

ck

144923
from affaan-m/everything-claude-code

Persistent per-project memory for Claude Code. Auto-loads project context on session start, tracks sessions with git activity, and writes to native memory. Commands run deterministic Node.js scripts — behavior is consistent across model versions.

DevelopmentClaude

benchmark

144923
from affaan-m/everything-claude-code

Use this skill to measure performance baselines, detect regressions before/after PRs, and compare stack alternatives.

DevelopmentClaude

swiftui-patterns

144923
from affaan-m/everything-claude-code

SwiftUI 架构模式,使用 @Observable 进行状态管理,视图组合,导航,性能优化,以及现代 iOS/macOS UI 最佳实践。

DevelopmentClaude

swift-protocol-di-testing

144923
from affaan-m/everything-claude-code

基于协议的依赖注入,用于可测试的Swift代码——使用聚焦协议和Swift Testing模拟文件系统、网络和外部API。

DevelopmentClaude