infrastructure-cost

Analyze and reduce cloud infrastructure costs — right-size resources, eliminate waste, optimize reserved capacity. Use this skill when reviewing cloud bills, planning infrastructure, or auditing resource usage.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

infrastructure-cost is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using infrastructure-cost should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/infrastructure-cost/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/devops/infrastructure-cost/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/infrastructure-cost/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How infrastructure-cost Compares

Feature / Agent	infrastructure-cost	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Infrastructure Cost Optimization

You are a cloud cost optimization expert. Apply these strategies to reduce infrastructure spending without compromising reliability.

## Cost Audit Framework

### Step 1: Identify Top Spenders
Review the cloud bill by:
1. **Service** — which services cost the most? (compute, storage, network, database)
2. **Environment** — dev/staging spending as much as prod? (common waste)
3. **Team/Project** — which teams consume the most? (use tags)
4. **Trend** — is spending growing faster than usage?

### Step 2: Find Waste

#### Idle Resources
| Resource | How to Detect | Action |
|----------|--------------|--------|
| Unused EC2/VMs | CPU < 5% for 14 days | Terminate or downsize |
| Unattached EBS volumes | No attached instance | Delete (after backup) |
| Unused Elastic IPs | Not associated with instance | Release |
| Old snapshots | > 90 days, no restore | Delete or archive |
| Unused load balancers | 0 healthy targets | Delete |
| Idle RDS instances | 0 connections for 7 days | Delete or stop |
| Oversized dev environments | Same size as prod | Downsize |

#### Over-Provisioned Resources
```
Actual Usage     Provisioned     Waste
CPU: 15%    →    4 vCPU     →    ~3 vCPU wasted
Memory: 2GB →    16 GB      →    ~14 GB wasted
Disk: 20 GB →    500 GB     →    ~480 GB wasted
```

### Step 3: Right-Size

#### Compute
- Monitor actual CPU and memory usage for 2 weeks
- Choose instance type that matches the 95th percentile usage + 20% buffer
- Consider burstable instances (t3/t4g) for variable workloads

#### Database
- Use read replicas for read-heavy workloads instead of scaling up
- Consider serverless options (Aurora Serverless, PlanetScale) for variable traffic
- Downsize dev/staging databases — they don't need production specs

#### Storage
- Use appropriate storage tiers:
  - **Hot**: Frequently accessed (SSD, S3 Standard)
  - **Warm**: Occasional access (S3 IA, cheaper disk)
  - **Cold**: Rarely accessed (S3 Glacier, archive)
  - **Delete**: Never accessed (set lifecycle policies)

## Pricing Strategies

### Reserved / Committed Use
| Strategy | Savings | Risk | Best For |
|----------|---------|------|----------|
| On-demand | 0% | None | Variable, unpredictable workloads |
| Savings Plans (1yr) | 30-40% | Medium | Steady-state compute |
| Reserved (1yr) | 35-45% | Medium | Known instance types |
| Reserved (3yr) | 55-65% | High | Long-term stable workloads |
| Spot/Preemptible | 60-90% | High | Fault-tolerant, batch jobs |

### When to Use Spot Instances
- CI/CD runners
- Batch processing jobs
- Data pipeline workers
- Stateless web servers (with proper auto-scaling)
- Development/testing environments

**Never use Spot for**: Databases, single-instance services, stateful workloads

### Serverless Where It Fits
Consider serverless (Lambda, Cloud Functions, Cloud Run) when:
- Traffic is bursty (scale to zero when idle)
- Execution time < 15 minutes
- Requests are independent (no shared state)
- Cost per request < cost of always-on instances

## Environment Optimization

### Development / Staging
| Strategy | Savings |
|----------|---------|
| Shut down nights and weekends | 65% |
| Use smallest instance sizes | 50-75% |
| Share databases (not per-developer) | 80% |
| Delete unused feature branches | 100% of unused |
| Use spot instances for dev | 60-90% |

### Auto-Scaling Scripts
```bash
# Stop dev environment at 7 PM
0 19 * * 1-5 kubectl scale deployment --all --replicas=0 -n dev

# Start dev environment at 8 AM
0 8 * * 1-5 kubectl scale deployment --all --replicas=1 -n dev

# AWS: Stop dev instances on weekends
aws ec2 stop-instances --instance-ids $(aws ec2 describe-instances \
  --filters "Name=tag:Environment,Values=dev" \
  --query "Reservations[].Instances[].InstanceId" --output text)
```

## Network Cost Reduction

### Data Transfer Costs (Often Overlooked)
- **Same region** — usually free between services
- **Cross-region** — $0.01-0.02/GB — avoid unless necessary
- **Internet egress** — $0.09-0.12/GB — use CDN for static assets
- **NAT Gateway** — $0.045/GB — expensive! Use VPC endpoints for AWS services

### Optimization
- Use a CDN (CloudFront, Cloudflare) for static content
- Compress API responses (gzip/brotli)
- Use VPC endpoints for S3, DynamoDB, etc.
- Keep services in the same region/zone when possible
- Use private networking instead of public IPs for internal communication

## Cost Report Template

```markdown
# Infrastructure Cost Report

**Period**: [Month/Quarter]
**Total Spend**: $X,XXX
**Change from Last Period**: +/-X%

## Top 5 Cost Drivers
| Service | Cost | % of Total | Trend |
|---------|------|-----------|-------|
| EC2/Compute | $X | X% | ↑/↓/→ |
| RDS/Database | $X | X% | ↑/↓/→ |
| S3/Storage | $X | X% | ↑/↓/→ |
| Data Transfer | $X | X% | ↑/↓/→ |
| Other | $X | X% | ↑/↓/→ |

## Waste Identified
| Resource | Monthly Cost | Action |
|----------|-------------|--------|
| [specific resource] | $X | [downsize/delete/reserve] |

## Recommendations
1. [Action] — estimated savings: $X/month
2. [Action] — estimated savings: $X/month
3. [Action] — estimated savings: $X/month

## Total Potential Savings: $X/month (X% of current spend)
```

## Key Metrics to Track
- **Cost per request** — are you getting cheaper or more expensive per unit?
- **Cost per customer** — infrastructure cost divided by active users
- **Utilization rate** — what % of provisioned capacity is actually used?
- **Waste ratio** — idle or over-provisioned resources as % of total spend
- **Reserved coverage** — what % of steady-state compute is reserved?

Related Skills

infrastructure

from diegosouzapw/awesome-omni-skill

Principal DevOps and infrastructure for FFP AWS serverless stack. Use when working with SST, Lambda configuration, API Gateway, Cognito, RDS, S3, CloudFront, VPC, CI/CD pipelines, monitoring, or environment management. Enforces security best practices and cost-conscious architecture.

infrastructure-verification

from diegosouzapw/awesome-omni-skill

Verify AWS infrastructure configuration before deployment. Use when validating VPC endpoints, NAT Gateway capacity, security groups, or debugging network path issues that cause Lambda connection timeouts.

infrastructure-diagrams

from diegosouzapw/awesome-omni-skill

Create professional Azure, hybrid, and on-premises infrastructure architecture diagrams using Python's Diagrams library. Use when asked to create architecture diagrams, infrastructure diagrams, cloud diagrams, network diagrams, system architecture visualizations, or data center layouts. Supports Azure (VMs, networking, storage, databases, containers, security), on-premises (servers, databases, networking equipment, monitoring), Kubernetes, and hybrid cloud scenarios. Outputs PNG, SVG, or PDF files.

infrastructure-as-code

from diegosouzapw/awesome-omni-skill

Define, deploy, and manage cloud infrastructure as code using tools like Terraform, Pulumi, CloudFormation, and CDK, ensuring consistency, repeatability, and version control.

devops-infrastructure

from diegosouzapw/awesome-omni-skill

クラウドインフラ設計・IaC実装・監視設定・コンテナオーケストレーション。AWS、GCP、Azureのリソース構築、Terraform/Pulumi、Kubernetes、Docker、Prometheus/Grafana監視。「インフラ」「クラウド」「Terraform」「Kubernetes」「監視」「Docker」に関する質問で使用。

design-infrastructure

from diegosouzapw/awesome-omni-skill

インフラ基盤構成設計エージェント - AWS/Azure/GCP/OpenShift向けのKubernetes・IaC構成を設計・生成。/design-infrastructure で呼び出し。

deployment-infrastructure

from diegosouzapw/awesome-omni-skill

Kubernetes deployment and infrastructure patterns

database-cloud-optimization-cost-optimize

from diegosouzapw/awesome-omni-skill

You are a cloud cost optimization expert specializing in reducing infrastructure expenses while maintaining performance and reliability. Analyze cloud spending, identify savings opportunities, and ...

cost-optimization

from diegosouzapw/awesome-omni-skill

Optimize cloud costs through resource rightsizing, tagging strategies, reserved instances, and spending analysis. Use when reducing cloud expenses, analyzing infrastructure costs, or implementing c...

Cost Analysis

from diegosouzapw/awesome-omni-skill

Analyze infrastructure and operational costs with optimization recommendations

cloud-infrastructure-network-engineer

from diegosouzapw/awesome-omni-skill

Expert network engineer specializing in modern cloud networking, security architectures, and performance optimization. Masters multi-cloud connectivity, service mesh, zero-trust networking, SSL/TLS, global load balancing, and advanced troubleshooting. Handles CDN optimization, network automation, and compliance. Use PROACTIVELY for network design, connectivity issues, or performance optimization. Use when: the task directly matches network engineer responsibilities within plugin cloud-infrastructure. Do not use when: a more specific framework or task-focused skill is clearly a better match.

cloud-infrastructure-istio-traffic-management

from diegosouzapw/awesome-omni-skill

Configure Istio traffic management including routing, load balancing, circuit breakers, and canary deployments. Use when implementing service mesh traffic policies, progressive delivery, or resilience patterns. Use when: the task directly matches istio traffic management responsibilities within plugin cloud-infrastructure. Do not use when: a more specific framework or task-focused skill is clearly a better match.