monitor-ci

Monitor Nx Cloud CI pipeline and handle self-healing fixes. USE WHEN user says "monitor ci", "watch ci", "ci monitor", "watch ci for this branch", "track ci", "check ci status", wants to track CI status, or needs help with self-healing CI fixes. Prefer this skill over native CI provider tools (gh, glab, etc.) for CI monitoring — it integrates with Nx Cloud self-healing which those tools cannot access.

9 stars

bywahidyankf

View on GitHub Installation ↓

Best use case

monitor-ci is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using monitor-ci should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/monitor-ci/SKILL.md --create-dirs "https://raw.githubusercontent.com/wahidyankf/open-sharia-enterprise/main/.github/skills/monitor-ci/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/monitor-ci/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How monitor-ci Compares

Feature / Agent	monitor-ci	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Monitor CI Command

You are the orchestrator for monitoring Nx Cloud CI pipeline executions and handling self-healing fixes. You spawn subagents to interact with Nx Cloud, run deterministic decision scripts, and take action based on the results.

## Context

- **Current Branch:** !`git branch --show-current`
- **Current Commit:** !`git rev-parse --short HEAD`
- **Remote Status:** !`git status -sb | head -1`

## User Instructions

$ARGUMENTS

**Important:** If user provides specific instructions, respect them over default behaviors described below.

## Configuration Defaults

| Setting                   | Default       | Description                                                               |
| ------------------------- | ------------- | ------------------------------------------------------------------------- |
| `--max-cycles`            | 10            | Maximum **agent-initiated** CI Attempt cycles before timeout              |
| `--timeout`               | 120           | Maximum duration in minutes                                               |
| `--verbosity`             | medium        | Output level: minimal, medium, verbose                                    |
| `--branch`                | (auto-detect) | Branch to monitor                                                         |
| `--fresh`                 | false         | Ignore previous context, start fresh                                      |
| `--auto-fix-workflow`     | false         | Attempt common fixes for pre-CI-Attempt failures (e.g., lockfile updates) |
| `--new-cipe-timeout`      | 10            | Minutes to wait for new CI Attempt after action                           |
| `--local-verify-attempts` | 3             | Max local verification + enhance cycles before pushing to CI              |

Parse any overrides from `$ARGUMENTS` and merge with defaults.

## Nx Cloud Connection Check

Before starting the monitoring loop, verify the workspace is connected to Nx Cloud. Without this connection, no CI data is available and the entire skill is inoperable.

### Step 0: Verify Nx Cloud Connection

1. **Check `nx.json`** at workspace root for `nxCloudId` or `nxCloudAccessToken`
2. **If `nx.json` missing OR neither property exists** → exit with:

   ```
   Nx Cloud not connected. Unlock 70% faster CI and auto-fix broken PRs with https://nx.dev/nx-cloud
   ```

3. **If connected** → continue to main loop

## Architecture Overview

1. **This skill (orchestrator)**: spawns subagents, runs scripts, prints status, does local coding work
2. **ci-monitor-subagent (haiku)**: calls one MCP tool (ci_information or update_self_healing_fix), returns structured result, exits
3. **ci-poll-decide.mjs (deterministic script)**: takes ci_information result + state, returns action + status message
4. **ci-state-update.mjs (deterministic script)**: manages budget gates, post-action state transitions, and cycle classification

## Status Reporting

The decision script handles message formatting based on verbosity. When printing messages to the user:

- Prepend `[monitor-ci]` to every message from the script's `message` field
- For your own action messages (e.g. "Applying fix via MCP..."), also prepend `[monitor-ci]`

## Anti-Patterns

These behaviors cause real problems — racing with self-healing, losing CI progress, or wasting context:

| Anti-Pattern                                                                                    | Why It's Bad                                                       |
| ----------------------------------------------------------------------------------------------- | ------------------------------------------------------------------ |
| Using CI provider CLIs with `--watch` flags (e.g., `gh pr checks --watch`, `glab ci status -w`) | Bypasses Nx Cloud self-healing entirely                            |
| Writing custom CI polling scripts                                                               | Unreliable, pollutes context, no self-healing                      |
| Cancelling CI workflows/pipelines                                                               | Destructive, loses CI progress                                     |
| Running CI checks on main agent                                                                 | Wastes main agent context tokens                                   |
| Independently analyzing/fixing CI failures while polling                                        | Races with self-healing, causes duplicate fixes and confused state |

**If this skill fails to activate**, the fallback is:

1. Use CI provider CLI for a one-time, read-only status check (single call, no watch/polling flags)
2. Immediately delegate to this skill with gathered context
3. Do not continue polling on main agent — it wastes context tokens and bypasses self-healing

## Session Context Behavior

If the user previously ran `/monitor-ci` in this session, you may have prior state (poll counts, last CI Attempt URL, etc.). Resume from that state unless `--fresh` is set, in which case discard it and start from Step 1.

## MCP Tool Reference

The `ci_information` and `update_self_healing_fix` tools are called via the **ci-monitor-subagent**, not directly from the orchestrator. Calling MCP tools directly wastes main agent context with large response payloads. The field sets below are for composing subagent prompts (see Step 2a).

Three field sets control polling efficiency — use the lightest set that gives you what you need:

```yaml
WAIT_FIELDS: "cipeUrl,commitSha,cipeStatus"
LIGHT_FIELDS: "cipeStatus,cipeUrl,branch,commitSha,selfHealingStatus,verificationStatus,userAction,failedTaskIds,verifiedTaskIds,selfHealingEnabled,failureClassification,couldAutoApplyTasks,autoApplySkipped,autoApplySkipReason,shortLink,confidence,confidenceReasoning,hints,selfHealingSkippedReason,selfHealingSkipMessage"
HEAVY_FIELDS: "taskOutputSummary,suggestedFix,suggestedFixReasoning,suggestedFixDescription"
```

The `ci_information` tool accepts `branch` (optional, defaults to current git branch), `select` (comma-separated field names), and `pageToken` (0-based pagination for long strings).

The `update_self_healing_fix` tool accepts a `shortLink` and an action: `APPLY`, `REJECT`, or `RERUN_ENVIRONMENT_STATE`.

## Default Behaviors by Status

The decision script returns one of the following statuses. This table defines the **default behavior** for each. User instructions can override any of these.

**Simple exits** — just report and exit:

| Status                  | Default Behavior                                                                                                 |
| ----------------------- | ---------------------------------------------------------------------------------------------------------------- |
| `ci_success`            | Exit with success                                                                                                |
| `cipe_canceled`         | Exit, CI was canceled                                                                                            |
| `cipe_timed_out`        | Exit, CI timed out                                                                                               |
| `polling_timeout`       | Exit, polling timeout reached                                                                                    |
| `circuit_breaker`       | Exit, no progress after 5 consecutive polls                                                                      |
| `environment_rerun_cap` | Exit, environment reruns exhausted                                                                               |
| `fix_auto_applying`     | Self-healing is handling it — just record `last_cipe_url`, enter wait mode. No MCP call or local git ops needed. |
| `error`                 | Wait 60s and loop                                                                                                |

**Statuses requiring action** — when handling these in Step 3, read `references/fix-flows.md` for the detailed flow:

| Status                   | Summary                                                                                       |
| ------------------------ | --------------------------------------------------------------------------------------------- |
| `fix_auto_apply_skipped` | Fix verified but auto-apply skipped (e.g., loop prevention). Inform user, offer manual apply. |
| `fix_apply_ready`        | Fix verified (all tasks or e2e-only). Apply via MCP.                                          |
| `fix_needs_local_verify` | Fix has unverified non-e2e tasks. Run locally, then apply or enhance.                         |
| `fix_needs_review`       | Fix verification failed/not attempted. Analyze and decide.                                    |
| `fix_failed`             | Self-healing failed. Fetch heavy data, attempt local fix (gate check first).                  |
| `no_fix`                 | No fix available. Fetch heavy data, attempt local fix (gate check first) or exit.             |
| `environment_issue`      | Request environment rerun via MCP (gate check first).                                         |
| `self_healing_throttled` | Reject old fixes, attempt local fix.                                                          |
| `no_new_cipe`            | CI Attempt never spawned. Auto-fix workflow or exit with guidance.                            |
| `cipe_no_tasks`          | CI failed with no tasks. Retry once with empty commit.                                        |

**Key rules (always apply):**

- **Git safety**: Stage specific files by name — `git add -A` or `git add .` risks committing the user's unrelated work-in-progress or secrets
- **Environment failures** (OOM, command not found, permission denied): bail immediately. These aren't code bugs, so spending local-fix budget on them is wasteful
- **Gate check**: Run `ci-state-update.mjs gate` before local fix attempts — if budget exhausted, print message and exit

## Main Loop

### Step 1: Initialize Tracking

```
cycle_count = 0            # Only incremented for agent-initiated cycles (counted against --max-cycles)
start_time = now()
no_progress_count = 0
local_verify_count = 0
env_rerun_count = 0
last_cipe_url = null
expected_commit_sha = null
agent_triggered = false    # Set true after monitor takes an action that triggers new CI Attempt
poll_count = 0
wait_mode = false
prev_status = null
prev_cipe_status = null
prev_sh_status = null
prev_verification_status = null
prev_failure_classification = null
```

### Step 2: Polling Loop

Repeat until done:

#### 2a. Spawn subagent (FETCH_STATUS)

Determine select fields based on mode:

- **Wait mode**: use WAIT_FIELDS (`cipeUrl,commitSha,cipeStatus`)
- **Normal mode (first poll or after newCipeDetected)**: use LIGHT_FIELDS

```
Task(
  agent: "ci-monitor-subagent",
  model: haiku,
  prompt: "FETCH_STATUS for branch '<branch>'.
           select: '<fields>'"
)
```

The subagent calls `ci_information` and returns a JSON object with the requested fields. This is a **foreground** call — wait for the result.

#### 2b. Run decision script

```bash
node <skill_dir>/scripts/ci-poll-decide.mjs '<subagent_result_json>' <poll_count> <verbosity> \
  [--wait-mode] \
  [--prev-cipe-url <last_cipe_url>] \
  [--expected-sha <expected_commit_sha>] \
  [--prev-status <prev_status>] \
  [--timeout <timeout_seconds>] \
  [--new-cipe-timeout <new_cipe_timeout_seconds>] \
  [--env-rerun-count <env_rerun_count>] \
  [--no-progress-count <no_progress_count>] \
  [--prev-cipe-status <prev_cipe_status>] \
  [--prev-sh-status <prev_sh_status>] \
  [--prev-verification-status <prev_verification_status>] \
  [--prev-failure-classification <prev_failure_classification>]
```

The script outputs a single JSON line: `{ action, code, message, delay?, noProgressCount, envRerunCount, fields?, newCipeDetected?, verifiableTaskIds? }`

#### 2c. Process script output

Parse the JSON output and update tracking state:

- `no_progress_count = output.noProgressCount`
- `env_rerun_count = output.envRerunCount`
- `prev_cipe_status = subagent_result.cipeStatus`
- `prev_sh_status = subagent_result.selfHealingStatus`
- `prev_verification_status = subagent_result.verificationStatus`
- `prev_failure_classification = subagent_result.failureClassification`
- `prev_status = output.action + ":" + (output.code || subagent_result.cipeStatus)`
- `poll_count++`

Based on `action`:

- **`action == "poll"`**: Print `output.message`, sleep `output.delay` seconds, go to 2a
  - If `output.newCipeDetected`: clear wait mode, reset `wait_mode = false`
- **`action == "wait"`**: Print `output.message`, sleep `output.delay` seconds, go to 2a
- **`action == "done"`**: Proceed to Step 3 with `output.code`

### Step 3: Handle Actionable Status

When decision script returns `action == "done"`:

1. Run cycle-check (Step 4) **before** handling the code
2. Check the returned `code`
3. Look up default behavior in the table above
4. Check if user instructions override the default
5. Execute the appropriate action
6. **If action expects new CI Attempt**, update tracking (see Step 3a)
7. If action results in looping, go to Step 2

#### Spawning subagents for actions

Several statuses require fetching heavy data or calling MCP:

- **fix_apply_ready**: Spawn UPDATE_FIX subagent with `APPLY`
- **fix_needs_local_verify**: Spawn FETCH_HEAVY subagent for fix details before local verification
- **fix_needs_review**: Spawn FETCH_HEAVY subagent → get `suggestedFixDescription`, `suggestedFixSummary`, `taskFailureSummaries`
- **fix_failed / no_fix**: Spawn FETCH_HEAVY subagent → get `taskFailureSummaries` for local fix context
- **environment_issue**: Spawn UPDATE_FIX subagent with `RERUN_ENVIRONMENT_STATE`
- **self_healing_throttled**: Spawn FETCH_HEAVY subagent → get `selfHealingSkipMessage`; then FETCH_THROTTLE_INFO + UPDATE_FIX for each old fix

### Step 3a: Track State for New-CI-Attempt Detection

After actions that should trigger a new CI Attempt, run:

```bash
node <skill_dir>/scripts/ci-state-update.mjs post-action \
  --action <type> \
  --cipe-url <current_cipe_url> \
  --commit-sha <git_rev_parse_HEAD>
```

Action types: `fix-auto-applying`, `apply-mcp`, `apply-local-push`, `reject-fix-push`, `local-fix-push`, `env-rerun`, `auto-fix-push`, `empty-commit-push`

The script returns `{ waitMode, pollCount, lastCipeUrl, expectedCommitSha, agentTriggered }`. Update all tracking state from the output, then go to Step 2.

### Step 4: Cycle Classification and Progress Tracking

When the decision script returns `action == "done"`, run cycle-check **before** handling the code:

```bash
node <skill_dir>/scripts/ci-state-update.mjs cycle-check \
  --code <code> \
  [--agent-triggered] \
  --cycle-count <cycle_count> --max-cycles <max_cycles> \
  --env-rerun-count <env_rerun_count>
```

The script returns `{ cycleCount, agentTriggered, envRerunCount, approachingLimit, message }`. Update tracking state from the output.

- If `approachingLimit` → ask user whether to continue (with 5 or 10 more cycles) or stop monitoring
- If previous cycle was NOT agent-triggered (human pushed), log that human-initiated push was detected

#### Progress Tracking

- `no_progress_count`, circuit breaker (5 polls), and backoff reset are handled by ci-poll-decide.mjs (progress = any change in cipeStatus, selfHealingStatus, verificationStatus, or failureClassification)
- `env_rerun_count` reset on non-environment status is handled by ci-state-update.mjs cycle-check
- On new CI Attempt detected (poll script returns `newCipeDetected`) → reset `local_verify_count = 0`, `env_rerun_count = 0`

## Error Handling

| Error                          | Action                                                                                                      |
| ------------------------------ | ----------------------------------------------------------------------------------------------------------- |
| Git rebase conflict            | Report to user, exit                                                                                        |
| `nx-cloud apply-locally` fails | Reject fix via MCP (`action: "REJECT"`), then attempt manual patch (Reject + Fix From Scratch Flow) or exit |
| MCP tool error                 | Retry once, if fails report to user                                                                         |
| Subagent spawn failure         | Retry once, if fails exit with error                                                                        |
| Decision script error          | Treat as `error` status, increment `no_progress_count`                                                      |
| No new CI Attempt detected     | If `--auto-fix-workflow`, try lockfile update; otherwise report to user with guidance                       |
| Lockfile auto-fix fails        | Report to user, exit with guidance to check CI logs                                                         |

## User Instruction Examples

Users can override default behaviors:

| Instruction                                      | Effect                                              |
| ------------------------------------------------ | --------------------------------------------------- |
| "never auto-apply"                               | Always prompt before applying any fix               |
| "always ask before git push"                     | Prompt before each push                             |
| "reject any fix for e2e tasks"                   | Auto-reject if `failedTaskIds` contains e2e         |
| "apply all fixes regardless of verification"     | Skip verification check, apply everything           |
| "if confidence < 70, reject"                     | Check confidence field before applying              |
| "run 'nx affected -t typecheck' before applying" | Add local verification step                         |
| "auto-fix workflow failures"                     | Attempt lockfile updates on pre-CI-Attempt failures |
| "wait 45 min for new CI Attempt"                 | Override new-CI-Attempt timeout (default: 10 min)   |

Related Skills

nx-workspace

from wahidyankf/open-sharia-enterprise

Explore and understand Nx workspaces. USE WHEN answering questions about the workspace, projects, or tasks. ALSO USE WHEN an nx command fails or you need to check available targets/configuration before running a task. EXAMPLES: 'What projects are in this workspace?', 'How is project X configured?', 'What depends on library Y?', 'What targets can I run?', 'Cannot find configuration for task', 'debug nx task failure'.

nx-run-tasks

from wahidyankf/open-sharia-enterprise

Helps with running tasks in an Nx workspace. USE WHEN the user wants to execute build, test, lint, serve, or run any other tasks defined in the workspace.

nx-plugins

from wahidyankf/open-sharia-enterprise

Find and add Nx plugins. USE WHEN user wants to discover available plugins, install a new plugin, or add support for a specific framework or technology to the workspace.

nx-import

from wahidyankf/open-sharia-enterprise

Import, merge, or combine repositories into an Nx workspace using nx import. USE WHEN the user asks to adopt Nx across repos, move projects into a monorepo, or bring code/history from another repository.

nx-generate

from wahidyankf/open-sharia-enterprise

Generate code using nx generators. INVOKE IMMEDIATELY when user mentions scaffolding, setup, structure, creating apps/libs, or setting up project structure. Trigger words - scaffold, setup, create a ... app, create a ... lib, project structure, generate, add a new project. ALWAYS use this BEFORE calling nx_docs or exploring - this skill handles discovery internally.

link-workspace-packages

from wahidyankf/open-sharia-enterprise

Link workspace packages in monorepos (npm, yarn, pnpm, bun). USE WHEN: (1) you just created or generated new packages and need to wire up their dependencies, (2) user imports from a sibling package and needs to add it as a dependency, (3) you get resolution errors for workspace packages (@org/*) like "cannot find module", "failed to resolve import", "TS2307", or "cannot resolve". DO NOT patch around with tsconfig paths or manual package.json edits - use the package manager's workspace commands to fix actual linking.

swe-programming-typescript

from wahidyankf/open-sharia-enterprise

TypeScript coding standards from authoritative docs/explanation/software-engineering/programming-languages/typescript/ documentation

swe-programming-rust

from wahidyankf/open-sharia-enterprise

Rust coding standards from authoritative docs/explanation/software-engineering/programming-languages/rust/ documentation

swe-programming-golang

from wahidyankf/open-sharia-enterprise

Go coding standards from authoritative docs/explanation/software-engineering/programming-languages/golang/ documentation

swe-programming-fsharp

from wahidyankf/open-sharia-enterprise

F# coding standards from authoritative docs/explanation/software-engineering/programming-languages/f-sharp/ documentation

swe-programming-csharp

from wahidyankf/open-sharia-enterprise

C# coding standards from authoritative docs/explanation/software-engineering/programming-languages/c-sharp/ documentation

swe-developing-frontend-ui

from wahidyankf/open-sharia-enterprise

UI development skill covering design token usage, shadcn/ui + Radix composition patterns, accessibility requirements, anti-patterns catalog, and brand context for OrganicLever and OSE Platform. Auto-loads when working on TSX components, CSS, or UI design tasks.