replit-observability

Monitor Replit deployments with health checks, uptime tracking, resource usage, and alerting. Use when setting up monitoring for Replit apps, building health dashboards, or configuring alerting for deployment health and performance. Trigger with phrases like "replit monitoring", "replit metrics", "replit observability", "monitor replit", "replit alerts", "replit uptime".

1,868 stars

Best use case

replit-observability is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Monitor Replit deployments with health checks, uptime tracking, resource usage, and alerting. Use when setting up monitoring for Replit apps, building health dashboards, or configuring alerting for deployment health and performance. Trigger with phrases like "replit monitoring", "replit metrics", "replit observability", "monitor replit", "replit alerts", "replit uptime".

Teams using replit-observability should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/replit-observability/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/saas-packs/replit-pack/skills/replit-observability/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/replit-observability/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How replit-observability Compares

Feature / Agentreplit-observabilityStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Monitor Replit deployments with health checks, uptime tracking, resource usage, and alerting. Use when setting up monitoring for Replit apps, building health dashboards, or configuring alerting for deployment health and performance. Trigger with phrases like "replit monitoring", "replit metrics", "replit observability", "monitor replit", "replit alerts", "replit uptime".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Replit Observability

## Overview
Monitor Replit deployment health, track cold starts, measure resource usage, and set up alerting. Covers Replit's built-in monitoring, external health checking, structured logging, and integration with monitoring services.

## Prerequisites
- Replit app deployed (Autoscale or Reserved VM)
- Health endpoint implemented (`/health`)
- External monitoring service (UptimeRobot, Better Stack, or Prometheus)

## Instructions

### Step 1: Health Endpoint with Detailed Metrics
```typescript
// src/routes/health.ts — comprehensive health check
import { Router } from 'express';
import { pool } from '../services/postgres';

const router = Router();
const startTime = Date.now();

router.get('/health', async (req, res) => {
  const checks: Record<string, any> = {
    status: 'ok',
    uptime: process.uptime(),
    bootTime: ((Date.now() - startTime) / 1000).toFixed(1) + 's ago',
    timestamp: new Date().toISOString(),
    repl: process.env.REPL_SLUG,
    region: process.env.REPLIT_DEPLOYMENT_REGION,
    env: process.env.NODE_ENV,
  };

  // Database check
  if (process.env.DATABASE_URL) {
    const dbStart = Date.now();
    try {
      await pool.query('SELECT 1');
      checks.database = {
        status: 'connected',
        latencyMs: Date.now() - dbStart,
        pool: { total: pool.totalCount, idle: pool.idleCount },
      };
    } catch (err: any) {
      checks.database = { status: 'disconnected', error: err.message };
      checks.status = 'degraded';
    }
  }

  // Memory metrics
  const mem = process.memoryUsage();
  checks.memory = {
    heapMB: Math.round(mem.heapUsed / 1024 / 1024),
    totalMB: Math.round(mem.heapTotal / 1024 / 1024),
    rssMB: Math.round(mem.rss / 1024 / 1024),
    percent: ((mem.heapUsed / mem.heapTotal) * 100).toFixed(1),
  };

  // Node.js info
  checks.runtime = {
    node: process.version,
    platform: process.platform,
    pid: process.pid,
  };

  res.status(checks.status === 'ok' ? 200 : 503).json(checks);
});

// Lightweight ping for uptime monitors
router.get('/ping', (req, res) => res.send('pong'));

export default router;
```

### Step 2: Structured Logging
```typescript
// src/utils/logger.ts — structured JSON logging
const IS_PROD = process.env.NODE_ENV === 'production';

type LogLevel = 'debug' | 'info' | 'warn' | 'error';

function log(level: LogLevel, message: string, data?: Record<string, any>) {
  if (level === 'debug' && IS_PROD) return;

  const entry = {
    timestamp: new Date().toISOString(),
    level,
    message,
    repl: process.env.REPL_SLUG,
    ...data,
  };

  // JSON format for machine parsing, human-readable in dev
  if (IS_PROD) {
    console[level === 'error' ? 'error' : 'log'](JSON.stringify(entry));
  } else {
    console[level === 'error' ? 'error' : 'log'](
      `[${level.toUpperCase()}] ${message}`,
      data || ''
    );
  }
}

export const logger = {
  debug: (msg: string, data?: any) => log('debug', msg, data),
  info: (msg: string, data?: any) => log('info', msg, data),
  warn: (msg: string, data?: any) => log('warn', msg, data),
  error: (msg: string, data?: any) => log('error', msg, data),
};

// Request logging middleware
export function requestLogger(req: any, res: any, next: any) {
  const start = Date.now();
  res.on('finish', () => {
    logger.info('request', {
      method: req.method,
      path: req.path,
      status: res.statusCode,
      durationMs: Date.now() - start,
      userId: req.headers['x-replit-user-id'] || 'anonymous',
    });
  });
  next();
}
```

### Step 3: External Uptime Monitoring
Set up external monitors to detect Autoscale cold starts and outages:

```markdown
UptimeRobot (free tier: 50 monitors):
1. Create new monitor: HTTP(s)
2. URL: https://your-app.replit.app/ping
3. Interval: 5 minutes
4. Alert contacts: email, Slack webhook

Better Stack / Datadog / Grafana Cloud:
- Same setup, more features
- Track response time trends
- Detect cold start patterns
- Set up PagerDuty integration

Key metrics to monitor externally:
- Uptime percentage (target: 99.9%)
- Response time P95 (target: < 2s)
- Cold start frequency (Autoscale only)
- SSL certificate expiry
```

### Step 4: Cold Start Detection
```typescript
// Track cold starts for Autoscale deployments
const COLD_START_THRESHOLD_MS = 5000;
let firstRequestTime: number | null = null;

app.use((req, res, next) => {
  if (!firstRequestTime) {
    firstRequestTime = Date.now();
    const bootTime = process.uptime();
    if (bootTime < 30) { // Just started
      logger.info('cold_start_detected', {
        bootTimeMs: Math.round(bootTime * 1000),
        path: req.path,
      });
    }
  }
  next();
});
```

### Step 5: Alerting Rules
```typescript
// src/utils/alerts.ts — send alerts to Slack on issues
async function alertSlack(message: string, severity: 'info' | 'warning' | 'critical') {
  const webhookUrl = process.env.SLACK_WEBHOOK_URL;
  if (!webhookUrl) return;

  const emoji = { info: 'information_source', warning: 'warning', critical: 'rotating_light' };
  await fetch(webhookUrl, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      text: `:${emoji[severity]}: [${severity.toUpperCase()}] ${process.env.REPL_SLUG}\n${message}`,
    }),
  });
}

// Monitor memory usage
setInterval(async () => {
  const mem = process.memoryUsage();
  const heapPercent = (mem.heapUsed / mem.heapTotal) * 100;

  if (heapPercent > 90) {
    await alertSlack(`Memory critical: ${heapPercent.toFixed(1)}% heap used`, 'critical');
  } else if (heapPercent > 75) {
    await alertSlack(`Memory warning: ${heapPercent.toFixed(1)}% heap used`, 'warning');
  }
}, 60000);

// Monitor error rate
let errorCount = 0;
let requestCount = 0;

app.use((req, res, next) => {
  requestCount++;
  res.on('finish', () => {
    if (res.statusCode >= 500) errorCount++;
  });
  next();
});

setInterval(async () => {
  if (requestCount > 0) {
    const errorRate = (errorCount / requestCount) * 100;
    if (errorRate > 5) {
      await alertSlack(`Error rate: ${errorRate.toFixed(1)}% (${errorCount}/${requestCount})`, 'critical');
    }
  }
  errorCount = 0;
  requestCount = 0;
}, 300000); // Check every 5 minutes
```

### Step 6: Replit Dashboard Monitoring
```markdown
Built-in monitoring in Replit:
1. Deployment Settings > Logs: real-time stdout/stderr
2. Deployment Settings > History: deploy timeline + rollbacks
3. Database pane > Settings: storage usage + connection info
4. Billing > Usage: compute, egress, and storage costs

Check deployment logs:
- Click on active deployment
- View real-time log stream
- Filter by error/warning
- Logs persist across container restarts
```

## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Cold starts undetected | No external monitor | Set up UptimeRobot or similar |
| Deployment logs missing | Container restarted | Use external log aggregator |
| Memory leak unnoticed | No memory monitoring | Add heap tracking + alerts |
| DB pool exhaustion | Too many connections | Monitor pool.totalCount in health |

## Resources
- [Monitoring Deployments](https://docs.replit.com/cloud-services/deployments/monitoring-a-deployment)
- [Replit Status Page](https://status.replit.com)
- [UptimeRobot](https://uptimerobot.com)

## Next Steps
For incident response, see `replit-incident-runbook`.

Related Skills

windsurf-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Monitor Windsurf AI adoption, feature usage, and team productivity metrics. Use when tracking AI feature usage, measuring ROI, setting up dashboards, or analyzing Cascade effectiveness across your team. Trigger with phrases like "windsurf monitoring", "windsurf metrics", "windsurf analytics", "windsurf usage", "windsurf adoption".

webflow-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Set up observability for Webflow integrations — Prometheus metrics for API calls, OpenTelemetry tracing, structured logging with pino, Grafana dashboards, and alerting for rate limits, errors, and latency. Trigger with phrases like "webflow monitoring", "webflow metrics", "webflow observability", "monitor webflow", "webflow alerts", "webflow tracing".

vercel-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Set up Vercel observability with runtime logs, analytics, log drains, and OpenTelemetry tracing. Use when implementing monitoring for Vercel deployments, setting up log drains, or configuring alerting for function errors and performance. Trigger with phrases like "vercel monitoring", "vercel metrics", "vercel observability", "vercel logs", "vercel alerts", "vercel tracing".

veeva-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Veeva Vault observability for enterprise operations. Use when implementing advanced Veeva Vault patterns. Trigger: "veeva observability".

vastai-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Monitor Vast.ai GPU instance health, utilization, and costs. Use when setting up monitoring dashboards, configuring alerts, or tracking GPU utilization and spending. Trigger with phrases like "vastai monitoring", "vastai metrics", "vastai observability", "monitor vastai", "vastai alerts".

twinmind-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Monitor TwinMind transcription quality, meeting coverage, action item extraction rates, and memory vault health. Use when implementing observability, or managing TwinMind meeting AI operations. Trigger with phrases like "twinmind observability", "twinmind observability".

speak-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Monitor Speak API health, assessment latency, session metrics, and pronunciation score distributions. Use when implementing observability, or managing Speak language learning platform operations. Trigger with phrases like "speak observability", "speak observability".

snowflake-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Set up Snowflake observability using ACCOUNT_USAGE views, alerts, and external monitoring. Use when implementing Snowflake monitoring dashboards, setting up query performance tracking, or configuring alerting for warehouse and pipeline health. Trigger with phrases like "snowflake monitoring", "snowflake metrics", "snowflake observability", "snowflake dashboard", "snowflake alerts".

shopify-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Set up observability for Shopify app integrations with query cost tracking, rate limit monitoring, webhook delivery metrics, and structured logging. Trigger with phrases like "shopify monitoring", "shopify metrics", "shopify observability", "monitor shopify API", "shopify alerts", "shopify dashboard".

salesforce-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Set up observability for Salesforce integrations with API limit monitoring, error tracking, and alerting. Use when implementing monitoring for Salesforce operations, tracking API consumption, or configuring alerting for Salesforce integration health. Trigger with phrases like "salesforce monitoring", "salesforce metrics", "salesforce observability", "monitor salesforce", "salesforce alerts", "salesforce API usage dashboard".

retellai-observability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Retell AI observability — AI voice agent and phone call automation. Use when working with Retell AI for voice agents, phone calls, or telephony. Trigger with phrases like "retell observability", "retellai-observability", "voice agent".

replit-webhooks-events

1868
from jeremylongshore/claude-code-plugins-plus-skills

Handle Replit deployment events, build Replit Extensions, and set up Agents & Automations. Use when integrating with Replit deployment lifecycle, building workspace extensions, or creating automated workflows with Replit Agent. Trigger with phrases like "replit webhook", "replit events", "replit extension", "replit automation", "replit notifications", "replit agent automation".