aws-ecs-monitor

AWS ECS production health monitoring with CloudWatch log analysis — monitors ECS service health, ALB targets, SSL certificates, and provides deep CloudWatch log analysis for error categorization, restart detection, and production alerts.

7 stars

byDemerzels-lab

View on GitHub Installation ↓

Best use case

aws-ecs-monitor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using aws-ecs-monitor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/aws-ecs-monitor/SKILL.md --create-dirs "https://raw.githubusercontent.com/Demerzels-lab/elsamultiskillagent/main/public/skills/briancolinger/aws-ecs-monitor/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/aws-ecs-monitor/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How aws-ecs-monitor Compares

Feature / Agent	aws-ecs-monitor	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# AWS ECS Monitor

Production health monitoring and log analysis for AWS ECS services.

## What It Does

- **Health Checks**: HTTP probes against your domain, ECS service status (desired vs running), ALB target group health, SSL certificate expiry
- **Log Analysis**: Pulls CloudWatch logs, categorizes errors (panics, fatals, OOM, timeouts, 5xx), detects container restarts, filters health check noise
- **Auto-Diagnosis**: Reads health status and automatically investigates failing services via log analysis

## Prerequisites

- `aws` CLI configured with appropriate IAM permissions:
  - `ecs:ListServices`, `ecs:DescribeServices`
  - `elasticloadbalancing:DescribeTargetGroups`, `elasticloadbalancing:DescribeTargetHealth`
  - `logs:FilterLogEvents`, `logs:DescribeLogGroups`
- `curl` for HTTP health checks
- `python3` for JSON processing and log analysis
- `openssl` for SSL certificate checks (optional)

## Configuration

All configuration is via environment variables:

| Variable | Required | Default | Description |
|---|---|---|---|
| `ECS_CLUSTER` | **Yes** | — | ECS cluster name |
| `ECS_REGION` | No | `us-east-1` | AWS region |
| `ECS_DOMAIN` | No | — | Domain for HTTP/SSL checks (skip if unset) |
| `ECS_SERVICES` | No | auto-detect | Comma-separated service names to monitor |
| `ECS_HEALTH_STATE` | No | `./data/ecs-health.json` | Path to write health state JSON |
| `ECS_HEALTH_OUTDIR` | No | `./data/` | Output directory for logs and alerts |
| `ECS_LOG_PATTERN` | No | `/ecs/{service}` | CloudWatch log group pattern (`{service}` is replaced) |
| `ECS_HTTP_ENDPOINTS` | No | — | Comma-separated `name=url` pairs for HTTP probes |

## Scripts

### `scripts/ecs-health.sh` — Health Monitor

```bash
# Full check
ECS_CLUSTER=my-cluster ECS_DOMAIN=example.com ./scripts/ecs-health.sh

# JSON output only
ECS_CLUSTER=my-cluster ./scripts/ecs-health.sh --json

# Quiet mode (no alerts, just status file)
ECS_CLUSTER=my-cluster ./scripts/ecs-health.sh --quiet
```

Exit codes: `0` = healthy, `1` = unhealthy/degraded, `2` = script error

### `scripts/cloudwatch-logs.sh` — Log Analyzer

```bash
# Pull raw logs from a service
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh pull my-api --minutes 30

# Show errors across all services
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh errors all --minutes 120

# Deep analysis with error categorization
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh diagnose --minutes 60

# Detect container restarts
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh restarts my-api

# Auto-diagnose from health state file
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh auto-diagnose

# Summary across all services
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh summary --minutes 120
```

Options: `--minutes N` (default: 60), `--json`, `--limit N` (default: 200), `--verbose`

## Auto-Detection

When `ECS_SERVICES` is not set, both scripts auto-detect services from the cluster:

```bash
aws ecs list-services --cluster $ECS_CLUSTER
```

Log groups are resolved by pattern (default `/ecs/{service}`). Override with `ECS_LOG_PATTERN`:

```bash
# If your log groups are /ecs/prod/my-api, /ecs/prod/my-frontend, etc.
ECS_LOG_PATTERN="/ecs/prod/{service}" ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh diagnose
```

## Integration

The health monitor can trigger the log analyzer for auto-diagnosis when issues are detected. Set `ECS_HEALTH_OUTDIR` to a shared directory and run both scripts together:

```bash
export ECS_CLUSTER=my-cluster
export ECS_DOMAIN=example.com
export ECS_HEALTH_OUTDIR=./data

# Run health check (auto-triggers log analysis on failure)
./scripts/ecs-health.sh

# Or run log analysis independently
./scripts/cloudwatch-logs.sh auto-diagnose --minutes 30
```

## Error Categories

The log analyzer classifies errors into:

- `panic` — Go panics
- `fatal` — Fatal errors
- `oom` — Out of memory
- `timeout` — Connection/request timeouts
- `connection_error` — Connection refused/reset
- `http_5xx` — HTTP 500-level responses
- `python_traceback` — Python tracebacks
- `exception` — Generic exceptions
- `auth_error` — Permission/authorization failures
- `structured_error` — JSON-structured error logs
- `error` — Generic ERROR-level messages

Health check noise (GET/HEAD `/health` from ALB) is automatically filtered from error counts and HTTP status distribution.

Related Skills

task-monitor

from Demerzels-lab/elsamultiskillagent

Real-time web dashboard for OpenClaw sessions and background tasks. Mobile-responsive with auto-refresh.

testflight-monitor

from Demerzels-lab/elsamultiskillagent

Monitor available TestFlight beta slots with smart app lookups and silent batch checking.

aave-liquidation-monitor

from Demerzels-lab/elsamultiskillagent

Proactive monitoring of Aave V3 borrow positions with liquidation alerts.

web-monitor-pro

from Demerzels-lab/elsamultiskillagent

Monitor web pages for changes, price drops, stock availability, and custom conditions.

bilibili-monitor

from Demerzels-lab/elsamultiskillagent

生成B站热门视频日报并发送邮件。触发词：B站热门、bilibili日报、视频日报、热门视频

price-monitor

from Demerzels-lab/elsamultiskillagent

Surveille les prix de produits sur des sites e-commerce et alerte quand ils baissent.

ecommerce-price-monitor

from Demerzels-lab/elsamultiskillagent

This skill enables Claude to monitor and track **product prices across major e-commerce platforms**.

crypto-gold-monitor

from Demerzels-lab/elsamultiskillagent

加密货币与贵金属价格监控 / Crypto & Precious Metals Price Monitor - 监控BTC/ETH实时价格、黄金(XAU)/白银(XAG)走势，免费API无需Key

bluetooth-monitor

from Demerzels-lab/elsamultiskillagent

蓝牙设备监控 / Bluetooth Device Monitor - 查看Mac已连接的蓝牙设备列表，支持配对、连接、断开操作

uniswap-monitor-teneo

from Demerzels-lab/elsamultiskillagent

AI-powered blockchain monitoring agent with real-time monitoring of Uniswap V2, V3, and V4 most known pools. Track swaps, monitor specific liquidity pools by address, and receive intelligent insights

btc-indicator-monitor-teneo

from Demerzels-lab/elsamultiskillagent

Real-time BTC price monitoring agent that sends alerts when price crosses key technical indicator levels. Monitors SMA, EMA, RSI, Bollinger Bands, and 15+ other indicators. Checks every 20 seconds w

monitored-ralph-loop

from Demerzels-lab/elsamultiskillagent

Generate copy-paste bash scripts for Ralph Wiggum/AI agent loops (Codex, Claude Code, OpenCode, Goose)