ci-watcher

Monitor CI/CD checks until green or failure with auto-diagnosis, failure classification (related vs flaky vs external), self-healing fix attempts, and smart retriggers for flaky E2E tests. Use for CI monitoring, pipeline failed, build broken, flaky test, CI red, check status, watch pipeline, Buildkite, GitHub Actions, re-trigger CI.

6 stars

bymParticle

View on GitHub Installation ↓

Best use case

ci-watcher is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using ci-watcher should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ci-watcher/SKILL.md --create-dirs "https://raw.githubusercontent.com/mParticle/aquarium/main/.claude/skills/ci-watcher/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ci-watcher/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ci-watcher Compares

Feature / Agent	ci-watcher	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# CI Watcher

You are an **autonomous CI monitoring agent** that watches CI/CD pipelines until
they reach a terminal state, then diagnoses failures and attempts self-healing
fixes. Input: a PR number, branch, or run ID. Output: real-time status updates
with automatic failure classification and resolution attempts.

## When to Use

- "Watch CI for this PR"
- "Monitor the pipeline until it's green"
- "Check if CI is passing"
- "Why is my build failing?"
- After pushing code, to monitor and auto-fix CI failures
- Hands-off CI monitoring with automatic retriggers for flaky tests

**Not for:**

- Investigating test logic or writing new tests (use `investigation` or
  `test-maker`)
- One-off build commands (run them directly)
- Deployment monitoring (CI/build pipeline only)

## Context

Aquarium component library:

- **Language**: TypeScript (React + Ant Design + Storybook)
- **CI/CD**: GitHub Actions
- **External Checks**: Cursor Bugbot (may stall; never poll indefinitely)

## The Process

### Phase 1: Initialize Monitoring

Detect context automatically. In a git repository, identify the current branch
and find the associated PR or run.

```bash
# Find runs for current branch
gh run list --branch $(git branch --show-current) --limit 5

# Or monitor specific PR
gh pr checks <PR_NUMBER> --watch
```

Poll interval: 30 seconds (default). Track check states across polls.

### Phase 2: Monitor Loop

Each poll cycle: retrieve check status and display a summary table.

```
=== HH:MM:SS ===
Check Name        Status    Duration
build             pass      3m12s
test-unit         pass      8m45s
lint              FAIL      0m32s
atlantis/plan     pending   5m
```

**Terminal conditions:**

- All checks pass: report success, exit
- Any check fails: classify and handle (Phase 3)
- Stalled pending (10+ minutes with no status change): exit with actionable
  message
- Only external checks pending (atlantis, Bugbot): exit, notify user
- 100 poll cycles reached (50 minutes at default interval): timeout, exit with
  warning

### Phase 3: Failure Classification

Classify each failure before attempting any fix.

**Get changed files:**

```bash
git diff --name-only origin/main...HEAD
```

**Compare failing files against changed files:**

- Overlap exists: failure is **RELATED** to the PR changes
- No overlap: failure is **UNRELATED** (flaky or infra)

**RELATED failures** (files the PR changed):

**Retrieve logs:** `gh run view <run_id> --log-failed`

**Extract error:** Capture the actual error message and file:line reference.

**Classify:** Build error, test failure, lint error, or timeout.

**Fix:** Apply a minimal, surgical fix. Commit with message:
`fix(ci): <brief description>`.

**Push and resume:** Push to trigger a re-run, then return to Phase 2.

**UNRELATED failures** (E2E flakes, infra, files the PR did not touch):

- Retrigger only: `gh run rerun <run_id> --failed`
- Never debug unrelated failures locally
- If the same test fails twice after retrigger, escalate to the user as a true
  flake
- Maximum 2 retriggers per check before escalating

**EXTERNAL checks** (atlantis, third-party services):

- Never attempt fixes
- Notify user with the specific next step (e.g., check PR comments, run
  `atlantis plan`)

### Phase 4: Self-Healing Verification

After applying a fix for a RELATED failure:

- Resume monitoring from Phase 2
- Same error recurs after fix: escalate to the user
- New error appears: re-enter Phase 3 for the new failure

## Common Patterns

| Check         | Typical Issue   | Action                                |
| ------------- | --------------- | ------------------------------------- |
| atlantis/plan | Stalled 10+ min | Check PR comments or re-trigger       |
| build/test    | Code error      | Get logs, fix code, push              |
| lint          | Formatting      | Run local lint, commit autofix        |
| E2E tests     | Known flaky     | Retrigger, never debug locally        |
| timeout       | Hanging test    | Check for infinite loops or deadlocks |

## CLI Reference

```bash
gh run list --limit 10              # List recent runs
gh run watch <run_id>               # Watch specific run
gh run view <run_id> --log-failed   # View failing job logs
gh run rerun <run_id> --failed      # Re-run failed jobs only
gh pr checks <pr_number>            # Get PR checks
gh pr checks <pr_number> --watch    # Watch PR checks (blocks until complete)
```

## Constraints

- **DO** poll at 30-second intervals until a terminal state
- **DO** classify failures as RELATED, UNRELATED, or EXTERNAL before acting
- **DO** retrigger unrelated/flaky failures instead of debugging them locally
- **DO** use `gh run view --log-failed` for actual errors -- never guess
- **DO** exit gracefully when only external checks remain pending
- **DO NOT** debug unrelated E2E flakes locally
- **DO NOT** poll indefinitely for external checks (atlantis, Bugbot)
- **DO NOT** retrigger the same flaky check more than 2 times
- **DO NOT** apply speculative fixes without reading the actual error logs

## Output Format

**During monitoring:** Status table with check name, status, and duration on each
poll cycle.

**On success:** Summary with total elapsed time, check count, and "ready for
merge" confirmation.

**On failure (after fix attempts):** Failing check name, error message, number of
fix attempts, and statement that manual intervention is needed.

**On stalled external check:** Which checks passed, which are stalled, and the
specific action the user must take.

**On flaky retrigger:** Failed test name, confirmation it is unrelated to PR
changes, and retrigger count (out of 2 maximum).

Related Skills

skill-tour

from mParticle/aquarium

Interactive guided tour of all available AI coding skills with live demos. Walks through headline capabilities, offers try-it-now demos, discovers repo-specific tools, and provides a cheat sheet reference. Triggers on what can you do, show skills, skill tour, available tools, capabilities, what skills.

publish-branch

from mParticle/aquarium

Push current branch to remote origin and generate PR title and description from branch name and commit history. Use when publishing a branch, creating a PR, pushing to remote, or preparing PR content. Triggers on publish branch, push branch, create PR, open pull request, push and PR.

pr

from mParticle/aquarium

Create a pull request from the current branch. Triggers on create PR/open PR/make PR/submit PR/push PR/raise PR/open a pull request/create a pull request/ready to merge/branch is ready when the user wants to turn their current branch into a GitHub pull request with a well-structured description

pr-review-handler

from mParticle/aquarium

Monitor PR review comments and automatically classify and address reviewer feedback including code changes, questions, and nits. Use when handling PR reviews, addressing reviewer comments, responding to code review feedback, or automating review resolution. Triggers on handle reviews, PR review, address feedback, reviewer comments, code review, review response.

jira-ticket-start

from mParticle/aquarium

Start work on a Jira ticket by fetching ticket details, creating a properly named feature branch, and beginning codebase investigation. Use when starting a new ticket, beginning work on a Jira issue, or picking up a task from the backlog. Triggers on start ticket, begin work, pick up ticket, start jira, new ticket work, PROJ-123.

jira-cli

from mParticle/aquarium

Jira ticket operations via Atlassian MCP including view, search (natural language to JQL), create, update, comment, and transition with auto-detection of ticket IDs from git branches. Triggers on jira, ticket, create ticket, update ticket, jira search, JQL, ticket status, move ticket, add comment, link ticket.

implement-ticket

from mParticle/aquarium

End-to-end Jira ticket implementation — fetches ticket, creates branch, implements changes, builds, commits, pushes, and creates a PR. Designed for non-engineers to ship design system changes by just providing a ticket ID. Triggers on implement ticket, ship ticket, do ticket, build ticket, implement MPD.

getting-started

from mParticle/aquarium

Analyze the current repo structure, build system, test setup, and conventions to provide a practical onboarding guide. Use when new to a codebase, joining a project, or wanting to understand how a repo is organized. Triggers on getting started, new to repo, onboard, how does this repo work, repo structure, codebase overview.

dry-code-reviewer

from mParticle/aquarium

Detects deeply nested loops with duplicated inline logic and recommends extracting into small, named functions. Enforces DRY principles, single-responsibility helpers, and flat iteration patterns. Triggers on nested loop, duplicated logic, extract function, DRY, refactor loop, code review, deeply nested, inline logic, readability.

conventional-commit

from mParticle/aquarium

Analyze staged git changes and generate a conventional commit message with proper type, scope, and description. Use when committing code changes, creating commits, writing commit messages, or staging files for commit. Triggers on commit, commit changes, stage and commit, conventional commit, commit message.

commit-push-watch

from mParticle/aquarium

Composite workflow that stages all changes, creates a conventional commit, pushes to origin, and monitors CI until green or failure. Use when you want to commit and push in one step with CI monitoring. Triggers on commit and push, push and watch, commit push watch, ship it, push and monitor CI.

add-rokt-icons

from mParticle/aquarium

Add Rokt/Untitled UI icons to the Aquarium library. Accepts a Figma URL, icon names, or a screenshot — figures out what's needed, registers icons, verifies build, and optionally creates a PR. Designed for designers. Triggers on add rokt icon, rokt icon, untitled ui icon, register rokt, add icons from figma.