alert-rules
Define, store, and evaluate threshold-based alert rules against log entry metrics. Fire alert events with cooldown debounce via a cron-based scheduler.
Best use case
alert-rules is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Define, store, and evaluate threshold-based alert rules against log entry metrics. Fire alert events with cooldown debounce via a cron-based scheduler.
Teams using alert-rules should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/alert-rules/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How alert-rules Compares
| Feature / Agent | alert-rules | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Define, store, and evaluate threshold-based alert rules against log entry metrics. Fire alert events with cooldown debounce via a cron-based scheduler.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# alert-rules Skill
## Overview
Alert rules are stored in the `alert_rules` SQLite table. A `node-cron` scheduler wakes up every N seconds (default 60) and evaluates all enabled rules. Each rule specifies a metric (error_rate, request_rate, status_count, keyword), an operator (gt/lt/gte/lte), a threshold value, and a look-back window in minutes. When a rule fires, an `alert_events` row is written. A cooldown period prevents repeated firings.
## SQLite Tables
```sql
CREATE TABLE alert_rules (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
file_id INTEGER REFERENCES log_files(id) ON DELETE CASCADE,
metric TEXT NOT NULL CHECK(metric IN ('error_rate','request_rate','status_count','keyword')),
operator TEXT NOT NULL CHECK(operator IN ('gt','lt','gte','lte')),
threshold REAL NOT NULL,
window_minutes INTEGER NOT NULL DEFAULT 5,
cooldown_minutes INTEGER NOT NULL DEFAULT 15,
enabled INTEGER NOT NULL DEFAULT 1,
last_fired_at TEXT,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE TABLE alert_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
rule_id INTEGER NOT NULL REFERENCES alert_rules(id) ON DELETE CASCADE,
fired_at TEXT NOT NULL DEFAULT (datetime('now')),
metric_value REAL NOT NULL,
message TEXT NOT NULL
);
```
## Metric Queries
Each metric is computed by a SQL query against the `log_entries` table. All queries accept a `since` ISO timestamp computed as `NOW - window_minutes`.
### error_rate (5xx errors per minute)
```typescript
function computeErrorRate(db: Database, rule: AlertRule): number {
const since = new Date(Date.now() - rule.window_minutes * 60_000).toISOString();
const fileFilter = rule.file_id != null ? 'AND file_id = ?' : '';
const params: unknown[] = rule.file_id != null ? [since, rule.file_id] : [since];
const row = db.prepare(`
SELECT
CAST(COUNT(*) FILTER (WHERE status >= 500) AS REAL) / ? AS rate
FROM log_entries
WHERE ts >= ? ${fileFilter}
`).get(rule.window_minutes, ...params) as { rate: number };
return row.rate ?? 0;
}
```
### request_rate (total requests per minute)
```typescript
function computeRequestRate(db: Database, rule: AlertRule): number {
const since = new Date(Date.now() - rule.window_minutes * 60_000).toISOString();
const fileFilter = rule.file_id != null ? 'AND file_id = ?' : '';
const params: unknown[] = rule.file_id != null ? [since, rule.file_id] : [since];
const row = db.prepare(`
SELECT CAST(COUNT(*) AS REAL) / ? AS rate
FROM log_entries
WHERE ts >= ? ${fileFilter}
`).get(rule.window_minutes, ...params) as { rate: number };
return row.rate ?? 0;
}
```
### status_count (count of a specific HTTP status code)
The rule stores the target status code in `threshold` and the count threshold in a separate field. For simplicity, `threshold` is split: the integer part is the count threshold, and the fractional part encodes the status code (e.g., `threshold = 200.404` means "count of 404s > 200"). A cleaner approach is to add a `metric_param` column.
Recommended schema extension:
```sql
ALTER TABLE alert_rules ADD COLUMN metric_param TEXT;
-- e.g. metric_param = '401' for status_count metric
```
```typescript
function computeStatusCount(db: Database, rule: AlertRule): number {
const since = new Date(Date.now() - rule.window_minutes * 60_000).toISOString();
const targetStatus = parseInt(rule.metric_param ?? '500', 10);
const fileFilter = rule.file_id != null ? 'AND file_id = ?' : '';
const params: unknown[] = rule.file_id != null ? [since, targetStatus, rule.file_id] : [since, targetStatus];
const row = db.prepare(`
SELECT COUNT(*) AS cnt
FROM log_entries
WHERE ts >= ? AND status = ? ${fileFilter}
`).get(...params) as { cnt: number };
return row.cnt ?? 0;
}
```
### keyword (count of log lines matching a keyword)
```typescript
function computeKeywordCount(db: Database, rule: AlertRule): number {
const since = new Date(Date.now() - rule.window_minutes * 60_000).toISOString();
const keyword = rule.metric_param ?? '';
const fileFilter = rule.file_id != null ? 'AND file_id = ?' : '';
const params: unknown[] = rule.file_id != null
? [since, `%${keyword}%`, `%${keyword}%`, rule.file_id]
: [since, `%${keyword}%`, `%${keyword}%`];
const row = db.prepare(`
SELECT COUNT(*) AS cnt
FROM log_entries
WHERE ts >= ?
AND (message LIKE ? OR raw LIKE ?)
${fileFilter}
`).get(...params) as { cnt: number };
return row.cnt ?? 0;
}
```
## evaluateRule
```typescript
export function evaluateRule(db: Database, rule: AlertRule): number {
switch (rule.metric) {
case 'error_rate': return computeErrorRate(db, rule);
case 'request_rate': return computeRequestRate(db, rule);
case 'status_count': return computeStatusCount(db, rule);
case 'keyword': return computeKeywordCount(db, rule);
default: return 0;
}
}
```
## Comparison and Firing
```typescript
function meetsThreshold(value: number, operator: string, threshold: number): boolean {
switch (operator) {
case 'gt': return value > threshold;
case 'lt': return value < threshold;
case 'gte': return value >= threshold;
case 'lte': return value <= threshold;
default: return false;
}
}
function isInCooldown(rule: AlertRule): boolean {
if (!rule.last_fired_at) return false;
const lastFired = new Date(rule.last_fired_at).getTime();
const cooldownMs = rule.cooldown_minutes * 60_000;
return Date.now() - lastFired < cooldownMs;
}
export function checkAndFire(db: Database, rule: AlertRule): void {
if (isInCooldown(rule)) return;
const value = evaluateRule(db, rule);
if (!meetsThreshold(value, rule.operator, rule.threshold)) return;
const fired_at = new Date().toISOString();
const message = buildMessage(rule, value);
db.prepare(`INSERT INTO alert_events (rule_id, fired_at, metric_value, message) VALUES (?,?,?,?)`)
.run(rule.id, fired_at, value, message);
db.prepare(`UPDATE alert_rules SET last_fired_at = ? WHERE id = ?`)
.run(fired_at, rule.id);
}
function buildMessage(rule: AlertRule, value: number): string {
const op = { gt: '>', lt: '<', gte: '>=', lte: '<=' }[rule.operator] ?? rule.operator;
const rounded = Math.round(value * 100) / 100;
return `${rule.name}: ${rule.metric} = ${rounded} ${op} ${rule.threshold} (window: ${rule.window_minutes}min)`;
}
```
## Scheduler
```typescript
// server/lib/scheduler.ts
import cron from 'node-cron';
import { getDb } from './db';
import { checkAndFire } from './alerts';
let task: cron.ScheduledTask | null = null;
export function startScheduler(intervalSeconds = 60): void {
// node-cron minimum granularity is 1 minute; for sub-minute use setInterval
if (intervalSeconds < 60) {
setInterval(runAll, intervalSeconds * 1_000);
return;
}
task = cron.schedule('* * * * *', runAll);
}
export function stopScheduler(): void {
task?.stop();
task = null;
}
function runAll(): void {
const db = getDb();
const rules = db.prepare(`SELECT * FROM alert_rules WHERE enabled = 1`).all() as AlertRule[];
for (const rule of rules) {
try {
checkAndFire(db, rule);
} catch (err) {
console.error(`Alert rule ${rule.id} evaluation failed:`, err);
}
}
}
```
Start the scheduler after the database is initialized:
```typescript
// server/index.ts
import { startScheduler } from './lib/scheduler';
// ...
app.listen(PORT, () => {
startScheduler(Number(process.env.ALERT_POLL_INTERVAL ?? 60));
console.log(`Server running on port ${PORT}`);
});
```
## Rule Validation (Zod)
```typescript
import { z } from 'zod';
export const AlertRuleSchema = z.object({
name: z.string().min(1).max(200),
file_id: z.number().int().nullable().optional(),
metric: z.enum(['error_rate', 'request_rate', 'status_count', 'keyword']),
operator: z.enum(['gt', 'lt', 'gte', 'lte']),
threshold: z.number().finite(),
window_minutes: z.number().int().min(1).max(1440),
cooldown_minutes: z.number().int().min(1).max(1440),
enabled: z.union([z.literal(0), z.literal(1)]).default(1),
metric_param: z.string().optional(),
});
export const AlertRulePatchSchema = AlertRuleSchema.partial();
```
## Test Rule Endpoint
The `POST /api/alert-rules/:id/test` endpoint evaluates the rule and returns the current metric value without inserting an event or updating `last_fired_at`:
```typescript
router.post('/:id/test', tryCatch(async (req, res) => {
const rule = getRule(db, Number(req.params.id));
if (!rule) return res.status(404).json({ error: 'RULE_NOT_FOUND' });
const metric_value = evaluateRule(db, rule);
const would_fire = meetsThreshold(metric_value, rule.operator, rule.threshold);
res.json({ metric_value, would_fire, rule });
}));
```
## Troubleshooting
**Rule never fires even when threshold is exceeded**
Check `last_fired_at` vs `cooldown_minutes`. If a rule fired recently, it will be skipped. Use the test endpoint to confirm the metric value without the cooldown check.
**error_rate returns 0 but errors exist**
Verify the `ts` column stores valid ISO 8601 strings. SQLite string comparison for dates requires consistent formatting. The nginx parser must produce `2026-03-20T03:07:42+00:00` format, not `20/Mar/2026:03:07:42 +0000`.
**Scheduler not running**
Confirm `startScheduler()` is called after `app.listen()`. Check that `node-cron` is installed (`pnpm add node-cron`). If `intervalSeconds < 60`, the setInterval path is used instead.
**High CPU on large log files**
The metric queries scan `log_entries` by `ts`. Ensure the index `idx_log_entries_ts` exists. For `status_count`, add a composite index `(file_id, status, ts)`. For `keyword`, full-scan LIKE is unavoidable without FTS5.Related Skills
alerting
Configure and manage cron-monitor alert delivery to Slack, email, or webhook endpoints. Use when you need to set up notifications for failed or missed cron jobs, test alert delivery, or manage existing alert configurations. Triggers include "configure alerts", "set up Slack notification", "webhook alert", "email notification", "notify on failure", or any task involving alert routing for cron jobs.
Skill: Uptime Monitoring
## Overview
Skill: Status Page
## Overview
Skill: unit-conversion
## Overview
Skill: recipe-scaler
## Overview
reading-list
Operate the reading-list API to save, manage, tag, search, and export articles.
email-digest
Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.
websocket-realtime
Use the WebSocket connection in poll-builder to receive live vote updates. Use when you need to stream real-time poll results, monitor a poll for new votes, or build a live dashboard. Triggers include "live results", "real-time updates", "stream votes", "watch poll", or "WebSocket".
poll-builder
Self-hosted poll creation tool with real-time results. Use when you need to create a poll, check vote counts, close a poll, export results, or get the shareable link for a poll. Triggers include "create poll", "vote", "poll results", "survey", "collect votes", "share poll", or any task involving polling or voting.
Skill: personal-finance
## Overview
Skill: csv-import
## Overview
Skill: Syntax Highlighting
## Purpose