backup-monitor

Track backup jobs via heartbeat pings, alert on missed or failed backups. Use when you need to monitor scheduled backup scripts, get alerted when a backup misses its window, or track backup execution history. Triggers include "backup monitoring", "backup alerts", "missed backup", "backup heartbeat", "backup job tracking", or any task involving backup reliability verification.

7 stars

Best use case

backup-monitor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Track backup jobs via heartbeat pings, alert on missed or failed backups. Use when you need to monitor scheduled backup scripts, get alerted when a backup misses its window, or track backup execution history. Triggers include "backup monitoring", "backup alerts", "missed backup", "backup heartbeat", "backup job tracking", or any task involving backup reliability verification.

Teams using backup-monitor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/backup-monitor/SKILL.md --create-dirs "https://raw.githubusercontent.com/heldernoid/agentic-build-templates/main/projects/devops-infrastructure/backup-monitor/skills/backup-monitor/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/backup-monitor/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How backup-monitor Compares

Feature / Agentbackup-monitorStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Track backup jobs via heartbeat pings, alert on missed or failed backups. Use when you need to monitor scheduled backup scripts, get alerted when a backup misses its window, or track backup execution history. Triggers include "backup monitoring", "backup alerts", "missed backup", "backup heartbeat", "backup job tracking", or any task involving backup reliability verification.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# backup-monitor

Monitor backup jobs with a heartbeat API. Get alerted when backups miss their window.

## When to Use

- Adding reliability monitoring to existing backup scripts
- Getting email/webhook alerts when a backup job does not run
- Tracking backup execution history and duration trends
- Verifying that backups completed before a restore operation

## Prerequisites

1. backup-monitor server running (self-hosted or Docker)
2. Backup scripts instrumented with curl ping calls
3. SMTP or webhook configured for alerts

## Installation (Docker)

```bash
docker compose up -d
# Dashboard at http://localhost:4000
```

## Quick Start

### Step 1 - Register a job

```bash
backup-monitor job create \
  --name nightly-postgres \
  --schedule "0 2 * * *" \
  --duration 5 \
  --grace 30 \
  --email ops@example.com
```

Output:
```
ok  job registered
--  id       job_a1b2c3d4e5f6
--  api key  bkm_xK9m4Pp8rQjL2TvN6yBd...

    Save this API key now. It will not be shown again.
```

### Step 2 - Instrument your backup script

```bash
#!/bin/bash
JOB_ID="job_a1b2c3d4e5f6"
API_KEY="bkm_xK9m4Pp8rQjL2TvN6yBd..."
BASE_URL="http://localhost:4000"

EXEC_ID=$(curl -s -X POST "$BASE_URL/ping/$JOB_ID/start" \
  -H "Authorization: Bearer $API_KEY" | jq -r '.executionId')

# Your backup command here
pg_dump mydb > /backups/mydb-$(date +%Y%m%d).sql
EXIT_CODE=$?

if [ $EXIT_CODE -eq 0 ]; then
  curl -s -X POST "$BASE_URL/ping/$JOB_ID/complete" \
    -H "Authorization: Bearer $API_KEY" \
    -d "{\"executionId\":\"$EXEC_ID\",\"exitCode\":0}"
else
  curl -s -X POST "$BASE_URL/ping/$JOB_ID/fail" \
    -H "Authorization: Bearer $API_KEY" \
    -d "{\"executionId\":\"$EXEC_ID\",\"exitCode\":$EXIT_CODE}"
fi
```

## CLI Reference

| Command | Description |
|---|---|
| `backup-monitor job create [flags]` | Register a new backup job |
| `backup-monitor job list` | List all jobs with status |
| `backup-monitor job delete <id>` | Delete a job |
| `backup-monitor ping start <id>` | Send start ping |
| `backup-monitor ping complete <id>` | Send completion ping |
| `backup-monitor ping fail <id>` | Send failure ping |
| `backup-monitor --help` | Show help |

## Job Create Flags

| Flag | Description | Default |
|---|---|---|
| `--name <name>` | Job identifier (required) | - |
| `--schedule <cron>` | Cron expression (required) | - |
| `--duration <min>` | Expected duration in minutes | 60 |
| `--grace <min>` | Grace period before alert fires | 30 |
| `--email <addr>` | Alert email address | global setting |
| `--webhook <url>` | Alert webhook URL | global setting |
| `--retention <days>` | History retention | 90 |

## Heartbeat API

All endpoints require `Authorization: Bearer <api-key>`.

```
POST /ping/:jobId/start       # Job started
POST /ping/:jobId/complete    # Job completed successfully
POST /ping/:jobId/fail        # Job failed
GET  /ping/:jobId/status      # Current status (no auth required)
```

## Dashboard API

```
GET  /api/jobs                # All jobs with status
POST /api/jobs                # Register job
GET  /api/jobs/:id/history    # Execution history
GET  /api/alerts              # Alert log
PATCH /api/settings           # Update SMTP/webhook settings
```

## Environment Variables

| Variable | Description | Default |
|---|---|---|
| `PORT` | Server port | 4000 |
| `NODE_ENV` | development or production | development |
| `DATABASE_PATH` | SQLite file path | ./data/backup-monitor.db |
| `DASHBOARD_USER` | Basic auth username | (disabled) |
| `DASHBOARD_PASS` | Basic auth password | (disabled) |
| `SMTP_HOST` | SMTP server | - |
| `SMTP_PORT` | SMTP port | 587 |
| `SMTP_USER` | SMTP username | - |
| `SMTP_PASS` | SMTP password | - |
| `SMTP_FROM` | From address | backup-monitor@localhost |
| `ALERT_CHECK_INTERVAL` | Alert check cron | * * * * * |
| `LOG_LEVEL` | debug, info, warn, error | info |

## Alert Behavior

- Alerts fire when: job missed its window (started time + expected duration + grace period)
- Alerts fire when: job sent a fail ping
- Recovery alert fires when: a previously missed/failed job completes successfully
- Duplicate alerts suppressed: same window = same alert (no spam)
- Job pause: suppress alerts without disabling monitoring

## Troubleshooting

### No alert received for missed backup

Check:
1. Grace period has passed (default 30 minutes after expected completion)
2. SMTP/webhook configured in Settings
3. Alert log in dashboard for delivery errors
4. Server logs: `docker logs backup-monitor`

### "Unauthorized" on ping

API key does not match. Each job has its own key shown once on creation. To reset, delete and recreate the job.

### Job showing "running" but backup is done

The completion ping was not sent (check exit code handling in your script) or the ping failed (check network connectivity to the monitor server).

Related Skills

Skill: Uptime Monitoring

7
from heldernoid/agentic-build-templates

## Overview

serial-monitor

7
from heldernoid/agentic-build-templates

No description provided.

ssl-cert-monitor

7
from heldernoid/agentic-build-templates

Operate ssl-cert-monitor -- add hosts, configure alert rules, trigger checks, review history, and deploy the stack.

cron-monitor

7
from heldernoid/agentic-build-templates

Send heartbeat pings to cron-monitor after cron job completion, check job status, and register new jobs. Use when you need to confirm a scheduled task ran successfully, check if a cron job is healthy, or add monitoring to a new cron script. Triggers include "ping cron-monitor", "check job status", "register cron job", "heartbeat", "cron health check", or any task involving scheduled job monitoring.

database-size-monitor

7
from heldernoid/agentic-build-templates

Dashboard for monitoring PostgreSQL and MySQL table sizes over time, with growth tracking, threshold alerts, and snapshot comparison

data-pipeline-monitor

7
from heldernoid/agentic-build-templates

Track ETL and data pipeline jobs with success/failure status, duration tracking, heartbeat monitoring, and dependency visualization. Use when you need to monitor scheduled jobs, detect failures, track pipeline health over time, or visualize ETL step dependencies. Triggers include "pipeline monitoring", "job tracking", "ETL status", "cron job health", "heartbeat monitor", "pipeline failed", or any task involving monitoring data workflows.

process-monitor

7
from heldernoid/agentic-build-templates

Monitor system processes for resource usage using process-tree watch mode. Use when tracking CPU or memory usage over time, finding resource hogs, or watching a specific process. Triggers include "monitor processes", "watch cpu usage", "process monitor", "top processes", "resource usage", "ptree watch".

Skill: Status Page

7
from heldernoid/agentic-build-templates

## Overview

Skill: unit-conversion

7
from heldernoid/agentic-build-templates

## Overview

Skill: recipe-scaler

7
from heldernoid/agentic-build-templates

## Overview

reading-list

7
from heldernoid/agentic-build-templates

Operate the reading-list API to save, manage, tag, search, and export articles.

email-digest

7
from heldernoid/agentic-build-templates

Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.