Codex

cost-optimizer

Analyze LLM pipeline costs and generate concrete optimization recommendations with savings estimates

104 stars

byjmagly

View on GitHub Installation ↓

Best use case

cost-optimizer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Analyze LLM pipeline costs and generate concrete optimization recommendations with savings estimates

Teams using cost-optimizer should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/cost-optimizer/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/cost-optimizer/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/cost-optimizer/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How cost-optimizer Compares

Feature / Agent	cost-optimizer	Standard Approach
Platform Support	Codex	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Analyze LLM pipeline costs and generate concrete optimization recommendations with savings estimates

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Cursor vs Codex for AI Workflows

Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

SKILL.md Source

# Cost Optimizer

**You are the Cost Optimizer** — analyzing LLM inference pipeline costs and producing concrete, numbered recommendations with savings estimates.

## Natural Language Triggers

- "optimize the cost of this pipeline"
- "reduce inference spend"
- "is this pipeline cost-efficient?"
- "how can I make this cheaper?"
- "cost analysis for my pipeline"

## Parameters

### Pipeline directory (positional)
Path to pipeline directory with `pipeline.config.yaml`.

### --volume N (optional)
Override monthly call volume for projections. Default: read from `cost_config.monthly_volume` in pipeline config.

## Execution

### Step 1: Baseline Analysis

Read `pipeline.config.yaml`. For each step:
- Identify model tier
- Estimate token counts (input = system prompt + template + avg dynamic content)
- Estimate output tokens from `max_tokens` setting
- Calculate per-call cost

### Step 2: Caching Analysis

For each step with a system prompt:
- Count stable prefix tokens (system prompt that doesn't change per request)
- Calculate cache savings: `prefix_tokens × input_price × 0.9 × monthly_volume`
- Flag if >500 stable prefix tokens and `cache_prefix: false`

### Step 3: Model Downgrade Assessment

For each step using sonnet or opus:
- Describe the cognitive complexity (extraction, classification, generation, reasoning)
- Estimate haiku feasibility based on task type:
  - Structured extraction → haiku usually sufficient
  - Classification → haiku usually sufficient
  - Complex multi-step reasoning → sonnet likely needed
  - Creative generation → sonnet/opus may be needed
- Recommend eval test to verify

### Step 4: Parallelization Analysis

For each pair of steps:
- Check data dependency (does step B consume step A's output?)
- If no dependency → flag as parallelizable
- Estimate latency reduction (not cost reduction, but throughput improvement)

### Step 5: Output

Generate `cost-model.yaml` in the pipeline directory (validated against cost-model schema).

Print summary:

```
Cost Analysis: pipelines/<name>/
  Current cost/call: $0.000090
  Monthly cost @ 100k: $9.00

  Recommendations:
  1. [HIGH IMPACT] Enable prefix caching on 'extract' step
     320 stable tokens × 100k calls = ~$2.88/mo savings (32%)
     Risk: None — enable cache_prefix: true in pipeline.config.yaml

  2. [MEDIUM IMPACT] Test claude-haiku-4-5 for 'classify' step
     Currently using sonnet — haiku is ~5x cheaper for classification
     Risk: Quality regression possible — run: aiwg nlp eval pipelines/<name>/ --model haiku
     Savings if haiku passes: ~$3.20/mo additional

  Optimized cost/call: $0.000032
  Optimized monthly cost: $3.20
  Total potential savings: 64%
```

## Savings Calculation

Always show:
1. Current cost (no optimization)
2. Cost with caching only
3. Cost with all recommended optimizations
4. Percentage savings at stated volume

Never recommend optimizations without a validation path — every recommendation includes either a command to verify or an explicit "risk: none" note.

## References

- @$AIWG_ROOT/agentic/code/addons/nlp-prod/README.md — nlp-prod addon overview
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/vague-discretion.md — Concrete savings estimates and validation requirements
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Analyze pipeline config before making recommendations
- @$AIWG_ROOT/docs/cli-reference.md — CLI reference for cost-report and metrics commands

Related Skills

cost-report

104

from jmagly/aiwg

Generate a cost and token-spending report for the current or most recent workflow session

Codex

cost-history

104

from jmagly/aiwg

Show cost trends across multiple workflow sessions, surfacing expensive operations, spending patterns, and outliers

Codex

aiwg-orchestrate

104

from jmagly/aiwg

Route structured artifact work to AIWG workflows via MCP with zero parent context cost

venv-manager

104

from jmagly/aiwg

Create, manage, and validate Python virtual environments. Use for project isolation and dependency management.

pytest-runner

104

from jmagly/aiwg

Execute Python tests with pytest, supporting fixtures, markers, coverage, and parallel execution. Use for Python test automation.

vitest-runner

104

from jmagly/aiwg

Execute JavaScript/TypeScript tests with Vitest, supporting coverage, watch mode, and parallel execution. Use for JS/TS test automation.

eslint-checker

104

from jmagly/aiwg

Run ESLint for JavaScript/TypeScript code quality and style enforcement. Use for static analysis and auto-fixing.

repo-analyzer

104

from jmagly/aiwg

Analyze GitHub repositories for structure, documentation, dependencies, and contribution patterns. Use for codebase understanding and health assessment.

pr-reviewer

104

from jmagly/aiwg

Review GitHub pull requests for code quality, security, and best practices. Use for automated PR feedback and approval workflows.

YouTube Acquisition

104

from jmagly/aiwg

yt-dlp patterns for acquiring content from YouTube and video platforms

Quality Filtering

104

from jmagly/aiwg

Accept/reject logic and quality scoring heuristics for media content

Provenance Tracking

104

from jmagly/aiwg

W3C PROV-O patterns for tracking media derivation chains and production history