alert-rules

Define, store, and evaluate threshold-based alert rules against log entry metrics. Fire alert events with cooldown debounce via a cron-based scheduler.

7 stars

byheldernoid

View on GitHub Installation ↓

Best use case

alert-rules is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Define, store, and evaluate threshold-based alert rules against log entry metrics. Fire alert events with cooldown debounce via a cron-based scheduler.

Teams using alert-rules should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/alert-rules/SKILL.md --create-dirs "https://raw.githubusercontent.com/heldernoid/agentic-build-templates/main/projects/data-analytics/log-analyzer/skills/alert-rules/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/alert-rules/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How alert-rules Compares

Feature / Agent	alert-rules	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Define, store, and evaluate threshold-based alert rules against log entry metrics. Fire alert events with cooldown debounce via a cron-based scheduler.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# alert-rules Skill

## Overview

Alert rules are stored in the `alert_rules` SQLite table. A `node-cron` scheduler wakes up every N seconds (default 60) and evaluates all enabled rules. Each rule specifies a metric (error_rate, request_rate, status_count, keyword), an operator (gt/lt/gte/lte), a threshold value, and a look-back window in minutes. When a rule fires, an `alert_events` row is written. A cooldown period prevents repeated firings.

## SQLite Tables

```sql
CREATE TABLE alert_rules (
  id               INTEGER PRIMARY KEY AUTOINCREMENT,
  name             TEXT NOT NULL,
  file_id          INTEGER REFERENCES log_files(id) ON DELETE CASCADE,
  metric           TEXT NOT NULL CHECK(metric IN ('error_rate','request_rate','status_count','keyword')),
  operator         TEXT NOT NULL CHECK(operator IN ('gt','lt','gte','lte')),
  threshold        REAL NOT NULL,
  window_minutes   INTEGER NOT NULL DEFAULT 5,
  cooldown_minutes INTEGER NOT NULL DEFAULT 15,
  enabled          INTEGER NOT NULL DEFAULT 1,
  last_fired_at    TEXT,
  created_at       TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE TABLE alert_events (
  id            INTEGER PRIMARY KEY AUTOINCREMENT,
  rule_id       INTEGER NOT NULL REFERENCES alert_rules(id) ON DELETE CASCADE,
  fired_at      TEXT NOT NULL DEFAULT (datetime('now')),
  metric_value  REAL NOT NULL,
  message       TEXT NOT NULL
);
```

## Metric Queries

Each metric is computed by a SQL query against the `log_entries` table. All queries accept a `since` ISO timestamp computed as `NOW - window_minutes`.

### error_rate (5xx errors per minute)

```typescript
function computeErrorRate(db: Database, rule: AlertRule): number {
  const since = new Date(Date.now() - rule.window_minutes * 60_000).toISOString();
  const fileFilter = rule.file_id != null ? 'AND file_id = ?' : '';
  const params: unknown[] = rule.file_id != null ? [since, rule.file_id] : [since];

  const row = db.prepare(`
    SELECT
      CAST(COUNT(*) FILTER (WHERE status >= 500) AS REAL) / ? AS rate
    FROM log_entries
    WHERE ts >= ? ${fileFilter}
  `).get(rule.window_minutes, ...params) as { rate: number };

  return row.rate ?? 0;
}
```

### request_rate (total requests per minute)

```typescript
function computeRequestRate(db: Database, rule: AlertRule): number {
  const since = new Date(Date.now() - rule.window_minutes * 60_000).toISOString();
  const fileFilter = rule.file_id != null ? 'AND file_id = ?' : '';
  const params: unknown[] = rule.file_id != null ? [since, rule.file_id] : [since];

  const row = db.prepare(`
    SELECT CAST(COUNT(*) AS REAL) / ? AS rate
    FROM log_entries
    WHERE ts >= ? ${fileFilter}
  `).get(rule.window_minutes, ...params) as { rate: number };

  return row.rate ?? 0;
}
```

### status_count (count of a specific HTTP status code)

The rule stores the target status code in `threshold` and the count threshold in a separate field. For simplicity, `threshold` is split: the integer part is the count threshold, and the fractional part encodes the status code (e.g., `threshold = 200.404` means "count of 404s > 200"). A cleaner approach is to add a `metric_param` column.

Recommended schema extension:
```sql
ALTER TABLE alert_rules ADD COLUMN metric_param TEXT;
-- e.g. metric_param = '401' for status_count metric
```

```typescript
function computeStatusCount(db: Database, rule: AlertRule): number {
  const since = new Date(Date.now() - rule.window_minutes * 60_000).toISOString();
  const targetStatus = parseInt(rule.metric_param ?? '500', 10);
  const fileFilter = rule.file_id != null ? 'AND file_id = ?' : '';
  const params: unknown[] = rule.file_id != null ? [since, targetStatus, rule.file_id] : [since, targetStatus];

  const row = db.prepare(`
    SELECT COUNT(*) AS cnt
    FROM log_entries
    WHERE ts >= ? AND status = ? ${fileFilter}
  `).get(...params) as { cnt: number };

  return row.cnt ?? 0;
}
```

### keyword (count of log lines matching a keyword)

```typescript
function computeKeywordCount(db: Database, rule: AlertRule): number {
  const since = new Date(Date.now() - rule.window_minutes * 60_000).toISOString();
  const keyword = rule.metric_param ?? '';
  const fileFilter = rule.file_id != null ? 'AND file_id = ?' : '';
  const params: unknown[] = rule.file_id != null
    ? [since, `%${keyword}%`, `%${keyword}%`, rule.file_id]
    : [since, `%${keyword}%`, `%${keyword}%`];

  const row = db.prepare(`
    SELECT COUNT(*) AS cnt
    FROM log_entries
    WHERE ts >= ?
      AND (message LIKE ? OR raw LIKE ?)
      ${fileFilter}
  `).get(...params) as { cnt: number };

  return row.cnt ?? 0;
}
```

## evaluateRule

```typescript
export function evaluateRule(db: Database, rule: AlertRule): number {
  switch (rule.metric) {
    case 'error_rate':    return computeErrorRate(db, rule);
    case 'request_rate':  return computeRequestRate(db, rule);
    case 'status_count':  return computeStatusCount(db, rule);
    case 'keyword':       return computeKeywordCount(db, rule);
    default:              return 0;
  }
}
```

## Comparison and Firing

```typescript
function meetsThreshold(value: number, operator: string, threshold: number): boolean {
  switch (operator) {
    case 'gt':  return value > threshold;
    case 'lt':  return value < threshold;
    case 'gte': return value >= threshold;
    case 'lte': return value <= threshold;
    default:    return false;
  }
}

function isInCooldown(rule: AlertRule): boolean {
  if (!rule.last_fired_at) return false;
  const lastFired = new Date(rule.last_fired_at).getTime();
  const cooldownMs = rule.cooldown_minutes * 60_000;
  return Date.now() - lastFired < cooldownMs;
}

export function checkAndFire(db: Database, rule: AlertRule): void {
  if (isInCooldown(rule)) return;

  const value = evaluateRule(db, rule);

  if (!meetsThreshold(value, rule.operator, rule.threshold)) return;

  const fired_at = new Date().toISOString();
  const message = buildMessage(rule, value);

  db.prepare(`INSERT INTO alert_events (rule_id, fired_at, metric_value, message) VALUES (?,?,?,?)`)
    .run(rule.id, fired_at, value, message);

  db.prepare(`UPDATE alert_rules SET last_fired_at = ? WHERE id = ?`)
    .run(fired_at, rule.id);
}

function buildMessage(rule: AlertRule, value: number): string {
  const op = { gt: '>', lt: '<', gte: '>=', lte: '<=' }[rule.operator] ?? rule.operator;
  const rounded = Math.round(value * 100) / 100;
  return `${rule.name}: ${rule.metric} = ${rounded} ${op} ${rule.threshold} (window: ${rule.window_minutes}min)`;
}
```

## Scheduler

```typescript
// server/lib/scheduler.ts
import cron from 'node-cron';
import { getDb } from './db';
import { checkAndFire } from './alerts';

let task: cron.ScheduledTask | null = null;

export function startScheduler(intervalSeconds = 60): void {
  // node-cron minimum granularity is 1 minute; for sub-minute use setInterval
  if (intervalSeconds < 60) {
    setInterval(runAll, intervalSeconds * 1_000);
    return;
  }
  task = cron.schedule('* * * * *', runAll);
}

export function stopScheduler(): void {
  task?.stop();
  task = null;
}

function runAll(): void {
  const db = getDb();
  const rules = db.prepare(`SELECT * FROM alert_rules WHERE enabled = 1`).all() as AlertRule[];
  for (const rule of rules) {
    try {
      checkAndFire(db, rule);
    } catch (err) {
      console.error(`Alert rule ${rule.id} evaluation failed:`, err);
    }
  }
}
```

Start the scheduler after the database is initialized:
```typescript
// server/index.ts
import { startScheduler } from './lib/scheduler';
// ...
app.listen(PORT, () => {
  startScheduler(Number(process.env.ALERT_POLL_INTERVAL ?? 60));
  console.log(`Server running on port ${PORT}`);
});
```

## Rule Validation (Zod)

```typescript
import { z } from 'zod';

export const AlertRuleSchema = z.object({
  name: z.string().min(1).max(200),
  file_id: z.number().int().nullable().optional(),
  metric: z.enum(['error_rate', 'request_rate', 'status_count', 'keyword']),
  operator: z.enum(['gt', 'lt', 'gte', 'lte']),
  threshold: z.number().finite(),
  window_minutes: z.number().int().min(1).max(1440),
  cooldown_minutes: z.number().int().min(1).max(1440),
  enabled: z.union([z.literal(0), z.literal(1)]).default(1),
  metric_param: z.string().optional(),
});

export const AlertRulePatchSchema = AlertRuleSchema.partial();
```

## Test Rule Endpoint

The `POST /api/alert-rules/:id/test` endpoint evaluates the rule and returns the current metric value without inserting an event or updating `last_fired_at`:

```typescript
router.post('/:id/test', tryCatch(async (req, res) => {
  const rule = getRule(db, Number(req.params.id));
  if (!rule) return res.status(404).json({ error: 'RULE_NOT_FOUND' });

  const metric_value = evaluateRule(db, rule);
  const would_fire = meetsThreshold(metric_value, rule.operator, rule.threshold);

  res.json({ metric_value, would_fire, rule });
}));
```

## Troubleshooting

**Rule never fires even when threshold is exceeded**
Check `last_fired_at` vs `cooldown_minutes`. If a rule fired recently, it will be skipped. Use the test endpoint to confirm the metric value without the cooldown check.

**error_rate returns 0 but errors exist**
Verify the `ts` column stores valid ISO 8601 strings. SQLite string comparison for dates requires consistent formatting. The nginx parser must produce `2026-03-20T03:07:42+00:00` format, not `20/Mar/2026:03:07:42 +0000`.

**Scheduler not running**
Confirm `startScheduler()` is called after `app.listen()`. Check that `node-cron` is installed (`pnpm add node-cron`). If `intervalSeconds < 60`, the setInterval path is used instead.

**High CPU on large log files**
The metric queries scan `log_entries` by `ts`. Ensure the index `idx_log_entries_ts` exists. For `status_count`, add a composite index `(file_id, status, ts)`. For `keyword`, full-scan LIKE is unavoidable without FTS5.

Related Skills

alerting

from heldernoid/agentic-build-templates

Configure and manage cron-monitor alert delivery to Slack, email, or webhook endpoints. Use when you need to set up notifications for failed or missed cron jobs, test alert delivery, or manage existing alert configurations. Triggers include "configure alerts", "set up Slack notification", "webhook alert", "email notification", "notify on failure", or any task involving alert routing for cron jobs.

Skill: Uptime Monitoring

from heldernoid/agentic-build-templates

## Overview

Skill: Status Page

from heldernoid/agentic-build-templates

## Overview

Skill: unit-conversion

from heldernoid/agentic-build-templates

## Overview

Skill: recipe-scaler

from heldernoid/agentic-build-templates

## Overview

reading-list

from heldernoid/agentic-build-templates

Operate the reading-list API to save, manage, tag, search, and export articles.

email-digest

from heldernoid/agentic-build-templates

Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.

websocket-realtime

from heldernoid/agentic-build-templates

Use the WebSocket connection in poll-builder to receive live vote updates. Use when you need to stream real-time poll results, monitor a poll for new votes, or build a live dashboard. Triggers include "live results", "real-time updates", "stream votes", "watch poll", or "WebSocket".

poll-builder

from heldernoid/agentic-build-templates

Self-hosted poll creation tool with real-time results. Use when you need to create a poll, check vote counts, close a poll, export results, or get the shareable link for a poll. Triggers include "create poll", "vote", "poll results", "survey", "collect votes", "share poll", or any task involving polling or voting.

Skill: personal-finance

from heldernoid/agentic-build-templates

## Overview

Skill: csv-import

from heldernoid/agentic-build-templates

## Overview

Skill: Syntax Highlighting

from heldernoid/agentic-build-templates

## Purpose