alert-rules

Define, store, and evaluate threshold-based alert rules against log entry metrics. Fire alert events with cooldown debounce via a cron-based scheduler.

7 stars

Best use case

alert-rules is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Define, store, and evaluate threshold-based alert rules against log entry metrics. Fire alert events with cooldown debounce via a cron-based scheduler.

Teams using alert-rules should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/alert-rules/SKILL.md --create-dirs "https://raw.githubusercontent.com/heldernoid/agentic-build-templates/main/projects/data-analytics/log-analyzer/skills/alert-rules/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/alert-rules/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How alert-rules Compares

Feature / Agentalert-rulesStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Define, store, and evaluate threshold-based alert rules against log entry metrics. Fire alert events with cooldown debounce via a cron-based scheduler.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# alert-rules Skill

## Overview

Alert rules are stored in the `alert_rules` SQLite table. A `node-cron` scheduler wakes up every N seconds (default 60) and evaluates all enabled rules. Each rule specifies a metric (error_rate, request_rate, status_count, keyword), an operator (gt/lt/gte/lte), a threshold value, and a look-back window in minutes. When a rule fires, an `alert_events` row is written. A cooldown period prevents repeated firings.

## SQLite Tables

```sql
CREATE TABLE alert_rules (
  id               INTEGER PRIMARY KEY AUTOINCREMENT,
  name             TEXT NOT NULL,
  file_id          INTEGER REFERENCES log_files(id) ON DELETE CASCADE,
  metric           TEXT NOT NULL CHECK(metric IN ('error_rate','request_rate','status_count','keyword')),
  operator         TEXT NOT NULL CHECK(operator IN ('gt','lt','gte','lte')),
  threshold        REAL NOT NULL,
  window_minutes   INTEGER NOT NULL DEFAULT 5,
  cooldown_minutes INTEGER NOT NULL DEFAULT 15,
  enabled          INTEGER NOT NULL DEFAULT 1,
  last_fired_at    TEXT,
  created_at       TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE TABLE alert_events (
  id            INTEGER PRIMARY KEY AUTOINCREMENT,
  rule_id       INTEGER NOT NULL REFERENCES alert_rules(id) ON DELETE CASCADE,
  fired_at      TEXT NOT NULL DEFAULT (datetime('now')),
  metric_value  REAL NOT NULL,
  message       TEXT NOT NULL
);
```

## Metric Queries

Each metric is computed by a SQL query against the `log_entries` table. All queries accept a `since` ISO timestamp computed as `NOW - window_minutes`.

### error_rate (5xx errors per minute)

```typescript
function computeErrorRate(db: Database, rule: AlertRule): number {
  const since = new Date(Date.now() - rule.window_minutes * 60_000).toISOString();
  const fileFilter = rule.file_id != null ? 'AND file_id = ?' : '';
  const params: unknown[] = rule.file_id != null ? [since, rule.file_id] : [since];

  const row = db.prepare(`
    SELECT
      CAST(COUNT(*) FILTER (WHERE status >= 500) AS REAL) / ? AS rate
    FROM log_entries
    WHERE ts >= ? ${fileFilter}
  `).get(rule.window_minutes, ...params) as { rate: number };

  return row.rate ?? 0;
}
```

### request_rate (total requests per minute)

```typescript
function computeRequestRate(db: Database, rule: AlertRule): number {
  const since = new Date(Date.now() - rule.window_minutes * 60_000).toISOString();
  const fileFilter = rule.file_id != null ? 'AND file_id = ?' : '';
  const params: unknown[] = rule.file_id != null ? [since, rule.file_id] : [since];

  const row = db.prepare(`
    SELECT CAST(COUNT(*) AS REAL) / ? AS rate
    FROM log_entries
    WHERE ts >= ? ${fileFilter}
  `).get(rule.window_minutes, ...params) as { rate: number };

  return row.rate ?? 0;
}
```

### status_count (count of a specific HTTP status code)

The rule stores the target status code in `threshold` and the count threshold in a separate field. For simplicity, `threshold` is split: the integer part is the count threshold, and the fractional part encodes the status code (e.g., `threshold = 200.404` means "count of 404s > 200"). A cleaner approach is to add a `metric_param` column.

Recommended schema extension:
```sql
ALTER TABLE alert_rules ADD COLUMN metric_param TEXT;
-- e.g. metric_param = '401' for status_count metric
```

```typescript
function computeStatusCount(db: Database, rule: AlertRule): number {
  const since = new Date(Date.now() - rule.window_minutes * 60_000).toISOString();
  const targetStatus = parseInt(rule.metric_param ?? '500', 10);
  const fileFilter = rule.file_id != null ? 'AND file_id = ?' : '';
  const params: unknown[] = rule.file_id != null ? [since, targetStatus, rule.file_id] : [since, targetStatus];

  const row = db.prepare(`
    SELECT COUNT(*) AS cnt
    FROM log_entries
    WHERE ts >= ? AND status = ? ${fileFilter}
  `).get(...params) as { cnt: number };

  return row.cnt ?? 0;
}
```

### keyword (count of log lines matching a keyword)

```typescript
function computeKeywordCount(db: Database, rule: AlertRule): number {
  const since = new Date(Date.now() - rule.window_minutes * 60_000).toISOString();
  const keyword = rule.metric_param ?? '';
  const fileFilter = rule.file_id != null ? 'AND file_id = ?' : '';
  const params: unknown[] = rule.file_id != null
    ? [since, `%${keyword}%`, `%${keyword}%`, rule.file_id]
    : [since, `%${keyword}%`, `%${keyword}%`];

  const row = db.prepare(`
    SELECT COUNT(*) AS cnt
    FROM log_entries
    WHERE ts >= ?
      AND (message LIKE ? OR raw LIKE ?)
      ${fileFilter}
  `).get(...params) as { cnt: number };

  return row.cnt ?? 0;
}
```

## evaluateRule

```typescript
export function evaluateRule(db: Database, rule: AlertRule): number {
  switch (rule.metric) {
    case 'error_rate':    return computeErrorRate(db, rule);
    case 'request_rate':  return computeRequestRate(db, rule);
    case 'status_count':  return computeStatusCount(db, rule);
    case 'keyword':       return computeKeywordCount(db, rule);
    default:              return 0;
  }
}
```

## Comparison and Firing

```typescript
function meetsThreshold(value: number, operator: string, threshold: number): boolean {
  switch (operator) {
    case 'gt':  return value > threshold;
    case 'lt':  return value < threshold;
    case 'gte': return value >= threshold;
    case 'lte': return value <= threshold;
    default:    return false;
  }
}

function isInCooldown(rule: AlertRule): boolean {
  if (!rule.last_fired_at) return false;
  const lastFired = new Date(rule.last_fired_at).getTime();
  const cooldownMs = rule.cooldown_minutes * 60_000;
  return Date.now() - lastFired < cooldownMs;
}

export function checkAndFire(db: Database, rule: AlertRule): void {
  if (isInCooldown(rule)) return;

  const value = evaluateRule(db, rule);

  if (!meetsThreshold(value, rule.operator, rule.threshold)) return;

  const fired_at = new Date().toISOString();
  const message = buildMessage(rule, value);

  db.prepare(`INSERT INTO alert_events (rule_id, fired_at, metric_value, message) VALUES (?,?,?,?)`)
    .run(rule.id, fired_at, value, message);

  db.prepare(`UPDATE alert_rules SET last_fired_at = ? WHERE id = ?`)
    .run(fired_at, rule.id);
}

function buildMessage(rule: AlertRule, value: number): string {
  const op = { gt: '>', lt: '<', gte: '>=', lte: '<=' }[rule.operator] ?? rule.operator;
  const rounded = Math.round(value * 100) / 100;
  return `${rule.name}: ${rule.metric} = ${rounded} ${op} ${rule.threshold} (window: ${rule.window_minutes}min)`;
}
```

## Scheduler

```typescript
// server/lib/scheduler.ts
import cron from 'node-cron';
import { getDb } from './db';
import { checkAndFire } from './alerts';

let task: cron.ScheduledTask | null = null;

export function startScheduler(intervalSeconds = 60): void {
  // node-cron minimum granularity is 1 minute; for sub-minute use setInterval
  if (intervalSeconds < 60) {
    setInterval(runAll, intervalSeconds * 1_000);
    return;
  }
  task = cron.schedule('* * * * *', runAll);
}

export function stopScheduler(): void {
  task?.stop();
  task = null;
}

function runAll(): void {
  const db = getDb();
  const rules = db.prepare(`SELECT * FROM alert_rules WHERE enabled = 1`).all() as AlertRule[];
  for (const rule of rules) {
    try {
      checkAndFire(db, rule);
    } catch (err) {
      console.error(`Alert rule ${rule.id} evaluation failed:`, err);
    }
  }
}
```

Start the scheduler after the database is initialized:
```typescript
// server/index.ts
import { startScheduler } from './lib/scheduler';
// ...
app.listen(PORT, () => {
  startScheduler(Number(process.env.ALERT_POLL_INTERVAL ?? 60));
  console.log(`Server running on port ${PORT}`);
});
```

## Rule Validation (Zod)

```typescript
import { z } from 'zod';

export const AlertRuleSchema = z.object({
  name: z.string().min(1).max(200),
  file_id: z.number().int().nullable().optional(),
  metric: z.enum(['error_rate', 'request_rate', 'status_count', 'keyword']),
  operator: z.enum(['gt', 'lt', 'gte', 'lte']),
  threshold: z.number().finite(),
  window_minutes: z.number().int().min(1).max(1440),
  cooldown_minutes: z.number().int().min(1).max(1440),
  enabled: z.union([z.literal(0), z.literal(1)]).default(1),
  metric_param: z.string().optional(),
});

export const AlertRulePatchSchema = AlertRuleSchema.partial();
```

## Test Rule Endpoint

The `POST /api/alert-rules/:id/test` endpoint evaluates the rule and returns the current metric value without inserting an event or updating `last_fired_at`:

```typescript
router.post('/:id/test', tryCatch(async (req, res) => {
  const rule = getRule(db, Number(req.params.id));
  if (!rule) return res.status(404).json({ error: 'RULE_NOT_FOUND' });

  const metric_value = evaluateRule(db, rule);
  const would_fire = meetsThreshold(metric_value, rule.operator, rule.threshold);

  res.json({ metric_value, would_fire, rule });
}));
```

## Troubleshooting

**Rule never fires even when threshold is exceeded**
Check `last_fired_at` vs `cooldown_minutes`. If a rule fired recently, it will be skipped. Use the test endpoint to confirm the metric value without the cooldown check.

**error_rate returns 0 but errors exist**
Verify the `ts` column stores valid ISO 8601 strings. SQLite string comparison for dates requires consistent formatting. The nginx parser must produce `2026-03-20T03:07:42+00:00` format, not `20/Mar/2026:03:07:42 +0000`.

**Scheduler not running**
Confirm `startScheduler()` is called after `app.listen()`. Check that `node-cron` is installed (`pnpm add node-cron`). If `intervalSeconds < 60`, the setInterval path is used instead.

**High CPU on large log files**
The metric queries scan `log_entries` by `ts`. Ensure the index `idx_log_entries_ts` exists. For `status_count`, add a composite index `(file_id, status, ts)`. For `keyword`, full-scan LIKE is unavoidable without FTS5.

Related Skills

alerting

7
from heldernoid/agentic-build-templates

Configure and manage cron-monitor alert delivery to Slack, email, or webhook endpoints. Use when you need to set up notifications for failed or missed cron jobs, test alert delivery, or manage existing alert configurations. Triggers include "configure alerts", "set up Slack notification", "webhook alert", "email notification", "notify on failure", or any task involving alert routing for cron jobs.

Skill: Uptime Monitoring

7
from heldernoid/agentic-build-templates

## Overview

Skill: Status Page

7
from heldernoid/agentic-build-templates

## Overview

Skill: unit-conversion

7
from heldernoid/agentic-build-templates

## Overview

Skill: recipe-scaler

7
from heldernoid/agentic-build-templates

## Overview

reading-list

7
from heldernoid/agentic-build-templates

Operate the reading-list API to save, manage, tag, search, and export articles.

email-digest

7
from heldernoid/agentic-build-templates

Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.

websocket-realtime

7
from heldernoid/agentic-build-templates

Use the WebSocket connection in poll-builder to receive live vote updates. Use when you need to stream real-time poll results, monitor a poll for new votes, or build a live dashboard. Triggers include "live results", "real-time updates", "stream votes", "watch poll", or "WebSocket".

poll-builder

7
from heldernoid/agentic-build-templates

Self-hosted poll creation tool with real-time results. Use when you need to create a poll, check vote counts, close a poll, export results, or get the shareable link for a poll. Triggers include "create poll", "vote", "poll results", "survey", "collect votes", "share poll", or any task involving polling or voting.

Skill: personal-finance

7
from heldernoid/agentic-build-templates

## Overview

Skill: csv-import

7
from heldernoid/agentic-build-templates

## Overview

Skill: Syntax Highlighting

7
from heldernoid/agentic-build-templates

## Purpose