cloudwatch

AWS CloudWatch monitoring for logs, metrics, alarms, and dashboards. Use when setting up monitoring, creating alarms, querying logs with Insights, configuring metric filters, building dashboards, or troubleshooting application issues.

25 stars

byComeOnOliver

View on GitHub Installation ↓

Best use case

cloudwatch is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using cloudwatch should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/cloudwatch/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/itsmostafa/aws-agent-skills/cloudwatch/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/cloudwatch/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How cloudwatch Compares

Feature / Agent	cloudwatch	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# AWS CloudWatch

Amazon CloudWatch provides monitoring and observability for AWS resources and applications. It collects metrics, logs, and events, enabling you to monitor, troubleshoot, and optimize your AWS environment.

## Table of Contents

- [Core Concepts](#core-concepts)
- [Common Patterns](#common-patterns)
- [CLI Reference](#cli-reference)
- [Best Practices](#best-practices)
- [Troubleshooting](#troubleshooting)
- [References](#references)

## Core Concepts

### Metrics

Time-ordered data points published to CloudWatch. Key components:
- **Namespace**: Container for metrics (e.g., `AWS/Lambda`)
- **Metric name**: Name of the measurement (e.g., `Invocations`)
- **Dimensions**: Name-value pairs for filtering (e.g., `FunctionName=MyFunc`)
- **Statistics**: Aggregations (Sum, Average, Min, Max, SampleCount, pN)

### Logs

Log data from AWS services and applications:
- **Log groups**: Collections of log streams
- **Log streams**: Sequences of log events from same source
- **Log events**: Individual log entries with timestamp and message

### Alarms

Automated actions based on metric thresholds:
- **States**: OK, ALARM, INSUFFICIENT_DATA
- **Actions**: SNS notifications, Auto Scaling, EC2 actions

## Common Patterns

### Create a Metric Alarm

**AWS CLI:**

```bash
# CPU utilization alarm for EC2
aws cloudwatch put-metric-alarm \
  --alarm-name "HighCPU-i-1234567890abcdef0" \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --statistic Average \
  --period 300 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2 \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts \
  --ok-actions arn:aws:sns:us-east-1:123456789012:alerts
```

**boto3:**

```python
import boto3

cloudwatch = boto3.client('cloudwatch')

cloudwatch.put_metric_alarm(
    AlarmName='HighCPU-i-1234567890abcdef0',
    MetricName='CPUUtilization',
    Namespace='AWS/EC2',
    Statistic='Average',
    Period=300,
    Threshold=80.0,
    ComparisonOperator='GreaterThanThreshold',
    EvaluationPeriods=2,
    Dimensions=[
        {'Name': 'InstanceId', 'Value': 'i-1234567890abcdef0'}
    ],
    AlarmActions=['arn:aws:sns:us-east-1:123456789012:alerts'],
    OKActions=['arn:aws:sns:us-east-1:123456789012:alerts']
)
```

### Lambda Error Rate Alarm

```bash
aws cloudwatch put-metric-alarm \
  --alarm-name "LambdaErrorRate-MyFunction" \
  --metrics '[
    {
      "Id": "errors",
      "MetricStat": {
        "Metric": {
          "Namespace": "AWS/Lambda",
          "MetricName": "Errors",
          "Dimensions": [{"Name": "FunctionName", "Value": "MyFunction"}]
        },
        "Period": 60,
        "Stat": "Sum"
      },
      "ReturnData": false
    },
    {
      "Id": "invocations",
      "MetricStat": {
        "Metric": {
          "Namespace": "AWS/Lambda",
          "MetricName": "Invocations",
          "Dimensions": [{"Name": "FunctionName", "Value": "MyFunction"}]
        },
        "Period": 60,
        "Stat": "Sum"
      },
      "ReturnData": false
    },
    {
      "Id": "errorRate",
      "Expression": "errors/invocations*100",
      "Label": "Error Rate",
      "ReturnData": true
    }
  ]' \
  --threshold 5 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 3 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts
```

### Query Logs with Insights

```bash
# Find errors in Lambda logs
aws logs start-query \
  --log-group-name /aws/lambda/MyFunction \
  --start-time $(date -d '1 hour ago' +%s) \
  --end-time $(date +%s) \
  --query-string '
    fields @timestamp, @message
    | filter @message like /ERROR/
    | sort @timestamp desc
    | limit 50
  '

# Get query results
aws logs get-query-results --query-id <query-id>
```

**boto3:**

```python
import boto3
import time

logs = boto3.client('logs')

# Start query
response = logs.start_query(
    logGroupName='/aws/lambda/MyFunction',
    startTime=int(time.time()) - 3600,
    endTime=int(time.time()),
    queryString='''
        fields @timestamp, @message
        | filter @message like /ERROR/
        | sort @timestamp desc
        | limit 50
    '''
)

query_id = response['queryId']

# Wait for results
while True:
    result = logs.get_query_results(queryId=query_id)
    if result['status'] == 'Complete':
        break
    time.sleep(1)

for row in result['results']:
    print(row)
```

### Create Metric Filter

Extract metrics from log patterns:

```bash
# Create metric filter for error count
aws logs put-metric-filter \
  --log-group-name /aws/lambda/MyFunction \
  --filter-name ErrorCount \
  --filter-pattern "ERROR" \
  --metric-transformations \
    metricName=ErrorCount,metricNamespace=MyApp,metricValue=1,defaultValue=0
```

### Publish Custom Metrics

```python
import boto3

cloudwatch = boto3.client('cloudwatch')

cloudwatch.put_metric_data(
    Namespace='MyApp',
    MetricData=[
        {
            'MetricName': 'OrdersProcessed',
            'Value': 1,
            'Unit': 'Count',
            'Dimensions': [
                {'Name': 'Environment', 'Value': 'Production'},
                {'Name': 'OrderType', 'Value': 'Standard'}
            ]
        }
    ]
)
```

### Create Dashboard

```bash
cat > dashboard.json << 'EOF'
{
  "widgets": [
    {
      "type": "metric",
      "x": 0, "y": 0, "width": 12, "height": 6,
      "properties": {
        "title": "Lambda Invocations",
        "metrics": [
          ["AWS/Lambda", "Invocations", "FunctionName", "MyFunction"]
        ],
        "period": 60,
        "stat": "Sum",
        "region": "us-east-1"
      }
    },
    {
      "type": "log",
      "x": 12, "y": 0, "width": 12, "height": 6,
      "properties": {
        "title": "Recent Errors",
        "query": "SOURCE '/aws/lambda/MyFunction' | filter @message like /ERROR/ | limit 20",
        "region": "us-east-1"
      }
    }
  ]
}
EOF

aws cloudwatch put-dashboard \
  --dashboard-name MyAppDashboard \
  --dashboard-body file://dashboard.json
```

## CLI Reference

### Metrics Commands

| Command | Description |
|---------|-------------|
| `aws cloudwatch put-metric-data` | Publish custom metrics |
| `aws cloudwatch get-metric-data` | Retrieve metric values |
| `aws cloudwatch get-metric-statistics` | Get aggregated statistics |
| `aws cloudwatch list-metrics` | List available metrics |

### Alarms Commands

| Command | Description |
|---------|-------------|
| `aws cloudwatch put-metric-alarm` | Create or update alarm |
| `aws cloudwatch describe-alarms` | List alarms |
| `aws cloudwatch set-alarm-state` | Manually set alarm state |
| `aws cloudwatch delete-alarms` | Delete alarms |

### Logs Commands

| Command | Description |
|---------|-------------|
| `aws logs create-log-group` | Create log group |
| `aws logs put-log-events` | Write log events |
| `aws logs filter-log-events` | Search log events |
| `aws logs start-query` | Start Insights query |
| `aws logs put-metric-filter` | Create metric filter |
| `aws logs put-retention-policy` | Set log retention |

## Best Practices

### Metrics

- **Use dimensions wisely** — too many creates metric explosion
- **Aggregate before publishing** — batch custom metrics
- **Use high-resolution metrics** (1-second) only when needed
- **Set meaningful units** for custom metrics

### Alarms

- **Use composite alarms** for complex conditions
- **Set appropriate evaluation periods** to avoid flapping
- **Include OK actions** to track recovery
- **Use anomaly detection** for dynamic thresholds

### Logs

- **Set retention policies** — don't keep logs forever
- **Use structured logging** (JSON) for better querying
- **Create metric filters** for key events
- **Use Contributor Insights** for top-N analysis

### Cost Optimization

- **Delete unused dashboards**
- **Reduce log retention** for non-critical logs
- **Avoid high-resolution metrics** unless necessary
- **Use log subscription filters** instead of polling

## Troubleshooting

### Missing Metrics

**Causes:**
- Service not publishing yet (wait 1-5 minutes)
- Wrong namespace/dimensions
- Detailed monitoring not enabled (EC2)

**Debug:**

```bash
# List metrics for a namespace
aws cloudwatch list-metrics \
  --namespace AWS/Lambda \
  --dimensions Name=FunctionName,Value=MyFunction
```

### Alarm Stuck in INSUFFICIENT_DATA

**Causes:**
- Metric not being published
- Dimensions mismatch
- Evaluation period too short

**Debug:**

```bash
# Check if metric has data
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Invocations \
  --dimensions Name=FunctionName,Value=MyFunction \
  --start-time $(date -d '1 hour ago' -u +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 60 \
  --statistics Sum
```

### Log Events Not Appearing

**Causes:**
- IAM permissions missing
- CloudWatch Logs agent not running
- Log group doesn't exist

**Debug:**

```bash
# Check log streams
aws logs describe-log-streams \
  --log-group-name /aws/lambda/MyFunction \
  --order-by LastEventTime \
  --descending \
  --limit 5
```

### High CloudWatch Costs

**Check usage:**

```bash
# Get PutLogEvents usage
aws cloudwatch get-metric-statistics \
  --namespace AWS/Logs \
  --metric-name IncomingBytes \
  --dimensions Name=LogGroupName,Value=/aws/lambda/MyFunction \
  --start-time $(date -d '7 days ago' -u +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 86400 \
  --statistics Sum
```

## References

- [CloudWatch User Guide](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/)
- [CloudWatch Logs User Guide](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/)
- [CloudWatch API Reference](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/)
- [CloudWatch CLI Reference](https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/)
- [Logs Insights Query Syntax](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax.html)
- [boto3 CloudWatch](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/cloudwatch.html)

Related Skills

cloudwatch-alarm-creator

from ComeOnOliver/skillshub

Cloudwatch Alarm Creator - Auto-activating skill for AWS Skills. Triggers on: cloudwatch alarm creator, cloudwatch alarm creator Part of the AWS Skills skill category.

Daily Logs

from ComeOnOliver/skillshub

Record the user's daily activities, progress, decisions, and learnings in a structured, chronological format.

Socratic Method: The Dialectic Engine

from ComeOnOliver/skillshub

This skill transforms Claude into a Socratic agent — a cognitive partner who guides

Sokratische Methode: Die Dialektik-Maschine

from ComeOnOliver/skillshub

Dieser Skill verwandelt Claude in einen sokratischen Agenten — einen kognitiven Partner, der Nutzende durch systematisches Fragen zur Wissensentdeckung führt, anstatt direkt zu instruieren.

College Football Data (CFB)

from ComeOnOliver/skillshub

Before writing queries, consult `references/api-reference.md` for endpoints, conference IDs, team IDs, and data shapes.

College Basketball Data (CBB)

from ComeOnOliver/skillshub

Before writing queries, consult `references/api-reference.md` for endpoints, conference IDs, team IDs, and data shapes.

Betting Analysis

from ComeOnOliver/skillshub

Before writing queries, consult `references/api-reference.md` for odds formats, command parameters, and key concepts.

Research Proposal Generator

from ComeOnOliver/skillshub

Generate high-quality academic research proposals for PhD applications following Nature Reviews-style academic writing conventions.

Paper Slide Deck Generator

from ComeOnOliver/skillshub

Transform academic papers and content into professional slide deck images with automatic figure extraction.

Medical Imaging AI Literature Review Skill

from ComeOnOliver/skillshub

Write comprehensive literature reviews following a systematic 7-phase workflow.

Meeting Briefing Skill

from ComeOnOliver/skillshub

You are a meeting preparation assistant for an in-house legal team. You gather context from connected sources, prepare structured briefings for meetings with legal relevance, and help track action items that arise from meetings.

Canned Responses Skill

from ComeOnOliver/skillshub

You are a response template assistant for an in-house legal team. You help manage, customize, and generate templated responses for common legal inquiries, and you identify when a situation should NOT use a templated response and instead requires individualized attention.