collecting-infrastructure-metrics

This skill enables Claude to collect comprehensive infrastructure performance metrics across compute, storage, network, containers, load balancers, and databases. It is triggered when the user requests "collect infrastructure metrics", "monitor server performance", "set up performance dashboards", or needs to analyze system resource utilization. The skill configures metrics collection, sets up aggregation, and helps create infrastructure dashboards for health monitoring and capacity tracking. It supports configuration for Prometheus, Datadog, and CloudWatch.

25 stars

Best use case

collecting-infrastructure-metrics is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

This skill enables Claude to collect comprehensive infrastructure performance metrics across compute, storage, network, containers, load balancers, and databases. It is triggered when the user requests "collect infrastructure metrics", "monitor server performance", "set up performance dashboards", or needs to analyze system resource utilization. The skill configures metrics collection, sets up aggregation, and helps create infrastructure dashboards for health monitoring and capacity tracking. It supports configuration for Prometheus, Datadog, and CloudWatch.

Teams using collecting-infrastructure-metrics should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/infrastructure-metrics-collector/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/jeremylongshore/claude-code-plugins-plus-skills/infrastructure-metrics-collector/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/infrastructure-metrics-collector/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How collecting-infrastructure-metrics Compares

Feature / Agentcollecting-infrastructure-metricsStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

This skill enables Claude to collect comprehensive infrastructure performance metrics across compute, storage, network, containers, load balancers, and databases. It is triggered when the user requests "collect infrastructure metrics", "monitor server performance", "set up performance dashboards", or needs to analyze system resource utilization. The skill configures metrics collection, sets up aggregation, and helps create infrastructure dashboards for health monitoring and capacity tracking. It supports configuration for Prometheus, Datadog, and CloudWatch.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Overview

This skill automates the process of setting up infrastructure metrics collection. It identifies key performance indicators (KPIs) across various infrastructure layers, configures agents to collect these metrics, and assists in setting up central aggregation and visualization.

## How It Works

1. **Identify Infrastructure Layers**: Determines the infrastructure layers to monitor (compute, storage, network, containers, load balancers, databases).
2. **Configure Metrics Collection**: Sets up agents (Prometheus, Datadog, CloudWatch) to collect metrics from the identified layers.
3. **Aggregate Metrics**: Configures central aggregation of the collected metrics for analysis and visualization.
4. **Create Dashboards**: Generates infrastructure dashboards for health monitoring, performance analysis, and capacity tracking.

## When to Use This Skill

This skill activates when you need to:
- Monitor the performance of your infrastructure.
- Identify bottlenecks in your system.
- Set up dashboards for real-time monitoring.

## Examples

### Example 1: Setting up basic monitoring

User request: "Collect infrastructure metrics for my web server."

The skill will:
1. Identify compute, storage, and network layers relevant to the web server.
2. Configure Prometheus to collect CPU, memory, disk I/O, and network bandwidth metrics.

### Example 2: Troubleshooting database performance

User request: "I'm seeing slow database queries. Can you help me monitor the database performance?"

The skill will:
1. Identify the database layer and relevant metrics such as connection pool usage, replication lag, and cache hit rates.
2. Configure Datadog to collect these metrics and create a dashboard to visualize performance trends.

## Best Practices

- **Agent Selection**: Choose the appropriate agent (Prometheus, Datadog, CloudWatch) based on your existing infrastructure and monitoring tools.
- **Metric Granularity**: Balance the granularity of metrics collection with the storage and processing overhead. Collect only the essential metrics for your use case.
- **Alerting**: Configure alerts based on thresholds for key metrics to proactively identify and address performance issues.

## Integration

This skill can be integrated with other Claude Code plugins for deployment, configuration management, and alerting to provide a comprehensive infrastructure management solution. For example, it can be used with a deployment plugin to automatically configure metrics collection after deploying new infrastructure.

Related Skills

model-evaluation-metrics

25
from ComeOnOliver/skillshub

Model Evaluation Metrics - Auto-activating skill for ML Training. Triggers on: model evaluation metrics, model evaluation metrics Part of the ML Training skill category.

aggregating-performance-metrics

25
from ComeOnOliver/skillshub

This skill enables Claude to aggregate and centralize performance metrics from various sources. It is used when the user needs to consolidate metrics from applications, systems, databases, caches, queues, and external services into a central location for monitoring and analysis. The skill is triggered by requests to "aggregate metrics", "centralize performance metrics", or similar phrases related to metrics aggregation and monitoring. It facilitates designing a metrics taxonomy, choosing appropriate aggregation tools, and setting up dashboards and alerts.

detecting-infrastructure-drift

25
from ComeOnOliver/skillshub

This skill enables Claude to detect infrastructure drift from a desired state. It uses the `drift-detect` command to identify discrepancies between the current infrastructure configuration and the intended configuration, as defined in infrastructure-as-code tools like Terraform. Use this skill when the user asks to check for infrastructure drift, identify configuration changes, or ensure that the current infrastructure matches the desired state. It is particularly useful in DevOps workflows for maintaining infrastructure consistency and preventing configuration errors. Trigger this skill when the user mentions "drift detection," "infrastructure changes," "configuration drift," or requests a "drift report."

generating-infrastructure-as-code

25
from ComeOnOliver/skillshub

This skill enables Claude to generate Infrastructure as Code (IaC) configurations. It uses the infrastructure-as-code-generator plugin to create production-ready IaC for Terraform, CloudFormation, Pulumi, ARM Templates, and CDK. Use this skill when the user requests IaC configurations for cloud infrastructure, specifying the platform (e.g., Terraform, CloudFormation) and cloud provider (e.g., AWS, Azure, GCP), or when the user needs help automating infrastructure deployment. Trigger terms include: "generate IaC", "create Terraform", "CloudFormation template", "Pulumi program", "infrastructure code".

checking-infrastructure-compliance

25
from ComeOnOliver/skillshub

Execute use when you need to work with compliance checking. This skill provides compliance monitoring and validation with comprehensive guidance and automation. Trigger with phrases like "check compliance", "validate policies", or "audit compliance".

import-infrastructure-as-code

25
from ComeOnOliver/skillshub

Import existing Azure resources into Terraform using Azure CLI discovery and Azure Verified Modules (AVM). Use when asked to reverse-engineer live Azure infrastructure, generate Infrastructure as Code from existing subscriptions/resource groups/resource IDs, map dependencies, derive exact import addresses from downloaded module source, prevent configuration drift, and produce AVM-based Terraform files ready for validation and planning across any Azure resource type.

copilot-usage-metrics

25
from ComeOnOliver/skillshub

Retrieve and display GitHub Copilot usage metrics for organizations and enterprises using the GitHub CLI and REST API.

saas-metrics-coach

25
from ComeOnOliver/skillshub

SaaS financial health advisor. Use when a user shares revenue or customer numbers, or mentions ARR, MRR, churn, LTV, CAC, NRR, or asks how their SaaS business is doing.

terraform-infrastructure

25
from ComeOnOliver/skillshub

Terraform infrastructure as code workflow for provisioning cloud resources, creating reusable modules, and managing infrastructure at scale.

startup-metrics-framework

25
from ComeOnOliver/skillshub

This skill should be used when the user asks about "key startup metrics", "SaaS metrics", "CAC and LTV", "unit economics", "burn multiple", "rule of 40", "marketplace metrics", or requests guidance on tracking and optimizing business performance metrics.

risk-metrics-calculation

25
from ComeOnOliver/skillshub

Calculate portfolio risk metrics including VaR, CVaR, Sharpe, Sortino, and drawdown analysis. Use when measuring portfolio risk, implementing risk limits, or building risk monitoring systems.

infrastructure-reporting

25
from ComeOnOliver/skillshub

Generate comprehensive network infrastructure reports including health status, performance analysis, security audits, and capacity planning recommendations.