datadog-automation

Automate Datadog tasks via Rube MCP (Composio): query metrics, search logs, manage monitors/dashboards, create events and downtimes. Always search tools first for current schemas.

31,355 stars
Complexity: medium

About this skill

This skill enables an AI agent to seamlessly integrate with and automate various operations within Datadog, a leading monitoring and observability platform. Leveraging Composio's Rube MCP (Managed Connectivity Platform), the skill provides robust capabilities for an agent to programmatically interact with Datadog. Key functionalities include querying real-time metrics, executing detailed log searches, creating and managing monitors and dashboards, posting custom events, and scheduling downtimes for maintenance periods. By providing these tools, the skill allows AI agents to contribute to proactive incident response, automate routine observability management, generate performance reports, and assist with troubleshooting by quickly retrieving relevant data. The agent is always instructed to search Rube tools for current schemas before executing any operations to ensure valid and up-to-date requests.

Best use case

Automating incident response workflows by creating events or searching logs/metrics during an alert; Proactively setting up or modifying Datadog monitors for new services; Generating system health or performance reports by querying metrics; Scheduling downtimes to prevent false alerts during planned maintenance; Troubleshooting production issues by enabling the AI agent to search logs and metrics based on user queries.

Automate Datadog tasks via Rube MCP (Composio): query metrics, search logs, manage monitors/dashboards, create events and downtimes. Always search tools first for current schemas.

Successful execution of requested Datadog operations (e.g., metrics retrieved, logs found, monitor created/updated, event posted, downtime scheduled). The AI agent provides a clear summary or confirmation of the action taken within Datadog, enhancing operational efficiency and reducing manual overhead.

Practical example

Example input

Can you check the average CPU utilization for the 'prod-api-server' host group over the last 30 minutes in Datadog? If it's above 75%, create a warning event with the tag 'urgent-review'.

Example output

Searching Datadog for average CPU utilization for 'prod-api-server' over the last 30 minutes... The average CPU utilization was 68%. This is below the 75% threshold, so no warning event has been created. Would you like me to perform any other Datadog-related tasks?

When to use this skill

  • When an AI agent needs to programmatically interact with Datadog for monitoring, logging, or alerting purposes. Ideal for automating repetitive Datadog management tasks, assisting in incident resolution, or enabling AI-driven observability insights. Use when you need to quickly retrieve operational data or manage Datadog resources without manual intervention.

When not to use this skill

  • If your organization does not utilize Datadog as its primary monitoring and observability platform. Not suitable for tasks requiring deep human judgment or a final manual approval step for critical changes, unless the agent's output is integrated into a human-in-the-loop workflow. Do not use if the Rube MCP is not set up or if an active Datadog connection via Rube cannot be established.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/datadog-automation/SKILL.md --create-dirs "https://raw.githubusercontent.com/sickn33/antigravity-awesome-skills/main/plugins/antigravity-awesome-skills-claude/skills/datadog-automation/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/datadog-automation/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How datadog-automation Compares

Feature / Agentdatadog-automationStandard Approach
Platform SupportClaudeLimited / Varies
Context Awareness High Baseline
Installation ComplexitymediumN/A

Frequently Asked Questions

What does this skill do?

Automate Datadog tasks via Rube MCP (Composio): query metrics, search logs, manage monitors/dashboards, create events and downtimes. Always search tools first for current schemas.

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as medium. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Datadog Automation via Rube MCP

Automate Datadog monitoring and observability operations through Composio's Datadog toolkit via Rube MCP.

## Prerequisites

- Rube MCP must be connected (RUBE_SEARCH_TOOLS available)
- Active Datadog connection via `RUBE_MANAGE_CONNECTIONS` with toolkit `datadog`
- Always call `RUBE_SEARCH_TOOLS` first to get current tool schemas

## Setup

**Get Rube MCP**: Add `https://rube.app/mcp` as an MCP server in your client configuration. No API keys needed — just add the endpoint and it works.


1. Verify Rube MCP is available by confirming `RUBE_SEARCH_TOOLS` responds
2. Call `RUBE_MANAGE_CONNECTIONS` with toolkit `datadog`
3. If connection is not ACTIVE, follow the returned auth link to complete Datadog authentication
4. Confirm connection status shows ACTIVE before running any workflows

## Core Workflows

### 1. Query and Explore Metrics

**When to use**: User wants to query metric data or list available metrics

**Tool sequence**:
1. `DATADOG_LIST_METRICS` - List available metric names [Optional]
2. `DATADOG_QUERY_METRICS` - Query metric time series data [Required]

**Key parameters**:
- `query`: Datadog metric query string (e.g., `avg:system.cpu.user{host:web01}`)
- `from`: Start timestamp (Unix epoch seconds)
- `to`: End timestamp (Unix epoch seconds)
- `q`: Search string for listing metrics

**Pitfalls**:
- Query syntax follows Datadog's metric query format: `aggregation:metric_name{tag_filters}`
- `from` and `to` are Unix epoch timestamps in seconds, not milliseconds
- Valid aggregations: `avg`, `sum`, `min`, `max`, `count`
- Tag filters use curly braces: `{host:web01,env:prod}`
- Time range should not exceed Datadog's retention limits for the metric type

### 2. Search and Analyze Logs

**When to use**: User wants to search log entries or list log indexes

**Tool sequence**:
1. `DATADOG_LIST_LOG_INDEXES` - List available log indexes [Optional]
2. `DATADOG_SEARCH_LOGS` - Search logs with query and filters [Required]

**Key parameters**:
- `query`: Log search query using Datadog log query syntax
- `from`: Start time (ISO 8601 or Unix timestamp)
- `to`: End time (ISO 8601 or Unix timestamp)
- `sort`: Sort order ('asc' or 'desc')
- `limit`: Number of log entries to return

**Pitfalls**:
- Log queries use Datadog's log search syntax: `service:web status:error`
- Search is limited to retained logs within the configured retention period
- Large result sets require pagination; check for cursor/page tokens
- Log indexes control routing and retention; filter by index if known

### 3. Manage Monitors

**When to use**: User wants to create, update, mute, or inspect monitors

**Tool sequence**:
1. `DATADOG_LIST_MONITORS` - List all monitors with filters [Required]
2. `DATADOG_GET_MONITOR` - Get specific monitor details [Optional]
3. `DATADOG_CREATE_MONITOR` - Create a new monitor [Optional]
4. `DATADOG_UPDATE_MONITOR` - Update monitor configuration [Optional]
5. `DATADOG_MUTE_MONITOR` - Silence a monitor temporarily [Optional]
6. `DATADOG_UNMUTE_MONITOR` - Re-enable a muted monitor [Optional]

**Key parameters**:
- `monitor_id`: Numeric monitor ID
- `name`: Monitor display name
- `type`: Monitor type ('metric alert', 'service check', 'log alert', 'query alert', etc.)
- `query`: Monitor query defining the alert condition
- `message`: Notification message with @mentions
- `tags`: Array of tag strings
- `thresholds`: Alert threshold values (`critical`, `warning`, `ok`)

**Pitfalls**:
- Monitor `type` must match the query type; mismatches cause creation failures
- `message` supports @mentions for notifications (e.g., `@slack-channel`, `@pagerduty`)
- Thresholds vary by monitor type; metric monitors need `critical` at minimum
- Muting a monitor suppresses notifications but the monitor still evaluates
- Monitor IDs are numeric integers

### 4. Manage Dashboards

**When to use**: User wants to list, view, update, or delete dashboards

**Tool sequence**:
1. `DATADOG_LIST_DASHBOARDS` - List all dashboards [Required]
2. `DATADOG_GET_DASHBOARD` - Get full dashboard definition [Optional]
3. `DATADOG_UPDATE_DASHBOARD` - Update dashboard layout or widgets [Optional]
4. `DATADOG_DELETE_DASHBOARD` - Remove a dashboard (irreversible) [Optional]

**Key parameters**:
- `dashboard_id`: Dashboard identifier string
- `title`: Dashboard title
- `layout_type`: 'ordered' (grid) or 'free' (freeform positioning)
- `widgets`: Array of widget definition objects
- `description`: Dashboard description

**Pitfalls**:
- Dashboard IDs are alphanumeric strings (e.g., 'abc-def-ghi'), not numeric
- `layout_type` cannot be changed after creation; must recreate the dashboard
- Widget definitions are complex nested objects; get existing dashboard first to understand structure
- DELETE is permanent; there is no undo

### 5. Create Events and Manage Downtimes

**When to use**: User wants to post events or schedule maintenance downtimes

**Tool sequence**:
1. `DATADOG_LIST_EVENTS` - List existing events [Optional]
2. `DATADOG_CREATE_EVENT` - Post a new event [Required]
3. `DATADOG_CREATE_DOWNTIME` - Schedule a maintenance downtime [Optional]

**Key parameters for events**:
- `title`: Event title
- `text`: Event body text (supports markdown)
- `alert_type`: Event severity ('error', 'warning', 'info', 'success')
- `tags`: Array of tag strings

**Key parameters for downtimes**:
- `scope`: Tag scope for the downtime (e.g., `host:web01`)
- `start`: Start time (Unix epoch)
- `end`: End time (Unix epoch; omit for indefinite)
- `message`: Downtime description
- `monitor_id`: Specific monitor to downtime (optional, omit for scope-based)

**Pitfalls**:
- Event `text` supports Datadog's markdown format including @mentions
- Downtimes scope uses tag syntax: `host:web01`, `env:staging`
- Omitting `end` creates an indefinite downtime; always set an end time for maintenance
- Downtime `monitor_id` narrows to a single monitor; scope applies to all matching monitors

### 6. Manage Hosts and Traces

**When to use**: User wants to list infrastructure hosts or inspect distributed traces

**Tool sequence**:
1. `DATADOG_LIST_HOSTS` - List all reporting hosts [Required]
2. `DATADOG_GET_TRACE_BY_ID` - Get a specific distributed trace [Optional]

**Key parameters**:
- `filter`: Host search filter string
- `sort_field`: Sort hosts by field (e.g., 'name', 'apps', 'cpu')
- `sort_dir`: Sort direction ('asc' or 'desc')
- `trace_id`: Distributed trace ID for trace lookup

**Pitfalls**:
- Host list includes all hosts reporting to Datadog within the retention window
- Trace IDs are long numeric strings; ensure exact match
- Hosts that stop reporting are retained for a configured period before removal

## Common Patterns

### Monitor Query Syntax

**Metric alerts**:
```
avg(last_5m):avg:system.cpu.user{env:prod} > 90
```

**Log alerts**:
```
logs("service:web status:error").index("main").rollup("count").last("5m") > 10
```

### Tag Filtering

- Tags use `key:value` format: `host:web01`, `env:prod`, `service:api`
- Multiple tags: `{host:web01,env:prod}` (AND logic)
- Wildcard: `host:web*`

### Pagination

- Use `page` and `page_size` or offset-based pagination depending on endpoint
- Check response for total count to determine if more pages exist
- Continue until all results are retrieved

## Known Pitfalls

**Timestamps**:
- Most endpoints use Unix epoch seconds (not milliseconds)
- Some endpoints accept ISO 8601; check tool schema
- Time ranges should be reasonable (not years of data)

**Query Syntax**:
- Metric queries: `aggregation:metric{tags}`
- Log queries: `field:value` pairs
- Monitor queries vary by type; check Datadog documentation

**Rate Limits**:
- Datadog API has per-endpoint rate limits
- Implement backoff on 429 responses
- Batch operations where possible

## Quick Reference

| Task | Tool Slug | Key Params |
|------|-----------|------------|
| Query metrics | DATADOG_QUERY_METRICS | query, from, to |
| List metrics | DATADOG_LIST_METRICS | q |
| Search logs | DATADOG_SEARCH_LOGS | query, from, to, limit |
| List log indexes | DATADOG_LIST_LOG_INDEXES | (none) |
| List monitors | DATADOG_LIST_MONITORS | tags |
| Get monitor | DATADOG_GET_MONITOR | monitor_id |
| Create monitor | DATADOG_CREATE_MONITOR | name, type, query, message |
| Update monitor | DATADOG_UPDATE_MONITOR | monitor_id |
| Mute monitor | DATADOG_MUTE_MONITOR | monitor_id |
| Unmute monitor | DATADOG_UNMUTE_MONITOR | monitor_id |
| List dashboards | DATADOG_LIST_DASHBOARDS | (none) |
| Get dashboard | DATADOG_GET_DASHBOARD | dashboard_id |
| Update dashboard | DATADOG_UPDATE_DASHBOARD | dashboard_id, title, widgets |
| Delete dashboard | DATADOG_DELETE_DASHBOARD | dashboard_id |
| List events | DATADOG_LIST_EVENTS | start, end |
| Create event | DATADOG_CREATE_EVENT | title, text, alert_type |
| Create downtime | DATADOG_CREATE_DOWNTIME | scope, start, end |
| List hosts | DATADOG_LIST_HOSTS | filter, sort_field |
| Get trace | DATADOG_GET_TRACE_BY_ID | trace_id |

## When to Use
This skill is applicable to execute the workflow or actions described in the overview.

Related Skills

claude-monitor

31392
from sickn33/antigravity-awesome-skills

Monitor de performance do Claude Code e sistema local. Diagnostica lentidao, mede CPU/RAM/disco, verifica API latency e gera relatorios de saude do sistema.

Monitoring & DiagnosticsClaudeCursorGemini

monday-automation

31392
from sickn33/antigravity-awesome-skills

Automate Monday.com work management including boards, items, columns, groups, subitems, and updates via Rube MCP (Composio). Always search tools first for current schemas.

Project ManagementClaude

mixpanel-automation

31392
from sickn33/antigravity-awesome-skills

Automate Mixpanel tasks via Rube MCP (Composio): events, segmentation, funnels, cohorts, user profiles, JQL queries. Always search tools first for current schemas.

Data AutomationClaude

miro-automation

31392
from sickn33/antigravity-awesome-skills

Automate Miro tasks via Rube MCP (Composio): boards, items, sticky notes, frames, sharing, connectors. Always search tools first for current schemas.

Productivity & Content CreationClaude

microsoft-teams-automation

31392
from sickn33/antigravity-awesome-skills

Automate Microsoft Teams tasks via Rube MCP (Composio): send messages, manage channels, create meetings, handle chats, and search messages. Always search tools first for current schemas.

Collaboration ToolsClaude

make-automation

31392
from sickn33/antigravity-awesome-skills

Automate Make (Integromat) tasks via Rube MCP (Composio): operations, enums, language and timezone lookups. Always search tools first for current schemas.

Workflow ManagementClaude

mailchimp-automation

31392
from sickn33/antigravity-awesome-skills

Automate Mailchimp email marketing including campaigns, audiences, subscribers, segments, and analytics via Rube MCP (Composio). Always search tools first for current schemas.

Email MarketingClaude

linkedin-automation

31392
from sickn33/antigravity-awesome-skills

Automate LinkedIn tasks via Rube MCP (Composio): create posts, manage profile, company info, comments, and image uploads. Always search tools first for current schemas.

Social Media ManagementClaude

linear-automation

31392
from sickn33/antigravity-awesome-skills

Automate Linear tasks via Rube MCP (Composio): issues, projects, cycles, teams, labels. Always search tools first for current schemas.

Project ManagementClaude

klaviyo-automation

31392
from sickn33/antigravity-awesome-skills

Automate Klaviyo tasks via Rube MCP (Composio): manage email/SMS campaigns, inspect campaign messages, track tags, and monitor send jobs. Always search tools first for current schemas.

Marketing AutomationClaude

jira-automation

31392
from sickn33/antigravity-awesome-skills

Automate Jira tasks via Rube MCP (Composio): issues, projects, sprints, boards, comments, users. Always search tools first for current schemas.

Project ManagementClaude

intercom-automation

31392
from sickn33/antigravity-awesome-skills

Automate Intercom tasks via Rube MCP (Composio): conversations, contacts, companies, segments, admins. Always search tools first for current schemas.

CRM AutomationClaude