ln-814-optimization-executor

Executes optimization hypotheses with keep/discard testing loop. Use when applying validated performance improvements.

310 stars

Best use case

ln-814-optimization-executor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Executes optimization hypotheses with keep/discard testing loop. Use when applying validated performance improvements.

Teams using ln-814-optimization-executor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ln-814-optimization-executor/SKILL.md --create-dirs "https://raw.githubusercontent.com/levnikolaevich/claude-code-skills/main/skills-catalog/ln-814-optimization-executor/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ln-814-optimization-executor/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ln-814-optimization-executor Compares

Feature / Agent	ln-814-optimization-executor	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Executes optimization hypotheses with keep/discard testing loop. Use when applying validated performance improvements.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

> **Paths:** File paths (`shared/`, `references/`, `../ln-*`) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root. If `shared/` is missing, fetch files via WebFetch from `https://raw.githubusercontent.com/levnikolaevich/claude-code-skills/master/skills/{path}`.

# ln-814-optimization-executor

**Type:** L3 Worker
**Category:** 8XX Optimization

Executes optimization hypotheses from the researcher using keep/discard autoresearch loop. Supports multi-file changes, compound baselines, and any optimization type (algorithm, architecture, query, caching, batching).

---

## Overview

| Aspect | Details |
|--------|---------|
| **Input** | `.hex-skills/optimization/{slug}/context.md` OR conversation context (standalone invocation) |
| **Output** | Optimized code on isolated branch, per-hypothesis results, experiment log |
| **Pattern** | Strike-first: apply all → test → measure. Bisect only on failure. A/B only for contested alternatives |

---

## Workflow

**Phases:** Pre-flight → Baseline → Strike-First Execution → Report → Gap Analysis

---

## Phase 0: Pre-flight Checks

### Slug Resolution

- If invoked via Agent with contextStore containing `slug` — use directly.
- If invoked standalone — derive slug from context_file path or ask user.

### Step 1: Load Context

Read `.hex-skills/optimization/{slug}/context.md` from project root. Contains problem statement, profiling results, research hypotheses, and target metric.

If file not found: check conversation context for the same data (standalone invocation).

### Step 2: Pre-flight Validation

| Check | Required | Action if Missing |
|-------|----------|-------------------|
| Hypotheses provided (H1..H7) | Yes | Block — nothing to execute |
| Test infrastructure | Yes | Block (see ci_tool_detection.md) |
| Git clean state | Yes | Block (need clean baseline for revert) |
| Worktree isolation | Yes | Create per git_worktree_fallback.md |
| E2E safety test | No (recommended) | Read from context; WARN if null — full test suite as fallback gate |

**MANDATORY READ:** Load `shared/references/git_worktree_fallback.md` — use optimization rows.
**MANDATORY READ:** Load `shared/references/ci_tool_detection.md` — use Test Frameworks + Benchmarks sections.

**MANDATORY READ:** Load `shared/references/mcp_tool_preferences.md` and `shared/references/mcp_integration_patterns.md`.

Use `hex-line` as the primary path for code/config/script edits in this worker. Profilers and benchmarks stay the source of truth; do not treat `hex-graph` as runtime evidence here.

### E2E Safety Test

Read `e2e_test_command` from context file (discovered by profiler during test discovery phase).

| Source | Action |
|--------|--------|
| Context has `e2e_test_command` | Use as functional safety gate in Phase 2 |
| Context has `e2e_test_command = null` | WARN: full test suite is the fallback gate |
| Standalone (no context) | User must provide test command; block if missing |

---

## Phase 1: Establish Baseline

Reuse baseline from performance map (already measured with real metrics).

### From Context File

Read `performance_map.baseline` and `performance_map.test_command` from `.hex-skills/optimization/{slug}/context.md`.

| Field | Source |
|-------|--------|
| `test_command` | Discovered/created test command |
| `baseline` | Multi-metric snapshot: wall time, CPU, memory, I/O |

### Verification Run

Run `test_command` once to confirm baseline is still valid (code unchanged since profiling):

| Step | Action |
|------|--------|
| 1 | Run `test_command` |
| 2 | IF result within 10% of `baseline.wall_time_ms` → baseline confirmed |
| 3 | IF result diverges > 10% → re-measure (3 runs, median) as new baseline |
| 4 | IF test FAILS → BLOCK: "test fails on unmodified code" |

---

## Phase 2: Strike-First Execution

**MANDATORY READ:** Load [optimization_categories.md](references/optimization_categories.md) for pattern reference during implementation.

Apply maximum changes at once. Only fall back to A/B testing where sources genuinely disagree on approach.

### Step 1: Triage Hypotheses

Split hypotheses from researcher into two groups:

| Group | Criteria | Action |
|-------|----------|--------|
| **Uncontested** | Clear best approach, no conflicting alternatives | Apply directly in the strike |
| **Contested** | Multiple approaches exist (e.g., source A says cache, source B says batch) OR `conflicts_with` another hypothesis | A/B test each alternative on top of full implementation |

Most hypotheses should be uncontested — the researcher already ranked them by evidence.

### Step 2: Strike (Apply All Uncontested)

```
1. APPLY all uncontested hypotheses at once (all file edits)
2. VERIFY: Run full test suite
   IF tests FAIL:
     - IF fixable (typo, missing import) → fix & re-run ONCE
     - IF fundamental → BISECT (see Step 4)
3. E2E GATE (if e2e_test_command not null):
   IF FAIL → BISECT
4. MEASURE: 5 runs, median
5. COMPARE: improvement vs baseline
   IF improvement meets target → DONE. Commit all:
     git add {all_files}
     git commit -m "perf: apply optimizations H1,H2,H3,... (+{improvement}%)"
   IF no improvement → BISECT
```

### Step 3: Contested Alternatives (A/B on top of strike)

For each contested pair/group, with ALL uncontested changes already applied:

```
FOR each contested hypothesis group:
  1. Apply alternative A → test → measure (5 runs, median)
  2. Revert alternative A, apply alternative B → test → measure
  3. KEEP the winner. Commit.
  4. Winner becomes part of the baseline for next contested group.
```

### Step 4: Bisect (only on strike failure)

If strike fails tests or shows no improvement:

```
1. Revert all changes: git checkout -- . && git clean -fd
2. Binary search: apply first half of hypotheses → test
   - IF passes → problem in second half
   - IF fails → problem in first half
3. Narrow down to the breaking hypothesis
4. Remove it from strike, re-apply remaining → test → measure
5. Log removed hypothesis with reason
```

### Scope Rules

| Rule | Description |
|------|-------------|
| File scope | Multiple files allowed (not limited to single function) |
| Signature changes | Allowed if tests still pass |
| New files | Allowed (cache wrapper, batch adapter, utility) |
| New dependencies | Allowed if already in project ecosystem (e.g., using configured Redis) |
| Time budget | 45 minutes total |

### Revert Protocol

| Scope | Command |
|-------|---------|
| Full revert | `git checkout -- . && git clean -fd` (safe in worktree) |
| Single hypothesis | `git checkout -- {files}` (only during bisect) |

### Safety Rules

| Rule | Description |
|------|-------------|
| Traceability | Commit message lists all applied hypothesis IDs |
| Isolation | All work in isolated worktree; never modify main worktree |
| Bisect only on failure | Do NOT test hypotheses individually unless strike fails or alternatives genuinely conflict |
| Crash triage | Runtime crash → fix once if trivial (typo, import), else bisect to find cause |

### Stop Conditions (Execution Loop)

| Condition | Action |
|-----------|--------|
| Strike passes + improvement meets target | STOP — commit, proceed to Report |
| All contested alternatives tested | STOP — commit winner, proceed to Report |
| Bisect removes all hypotheses | STOP — report "all hypotheses failed" with profiling data |
| Time budget exceeded (45 min) | STOP — report partial results with remaining hypotheses |
| All tests fail after strike + bisect | STOP — full revert, report diagnostic value only |

---

## Phase 3: Report Results

### Report Schema

| Field | Description |
|-------|-------------|
| baseline | Original measurement (metric + value) |
| final | Final measurement after optimizations |
| total_improvement_pct | Overall percentage improvement |
| target_met | Boolean — did we reach the target metric? |
| strike_result | `clean` (all applied) / `bisected` (some removed) / `failed` |
| hypotheses_applied | List of hypothesis IDs applied in strike |
| hypotheses_removed | List removed during bisect (with reasons) |
| contested_results | Per-contested group: alternatives tested, winner, measurement |
| branch | Worktree branch name |
| files_modified | All changed files |
| e2e_test | `{ command, source, baseline_passed, final_passed }` or null |

### Results Comparison (mandatory)

Show baseline vs final for EVERY metric from `performance_map.baseline`. Include both percentage and multiplier.

```
| Metric | Baseline | After Strike | Improvement |
|--------|----------|-------------|-------------|
| Wall time | 7280ms | 3800ms | 47.8% (1.9x) |
| CPU time | 850ms | 720ms | 15.3% (1.2x) |
| Memory peak | 256MB | 245MB | 4.3% |
| HTTP round-trips | 13 | 2 | 84.6% (6.5x) |

Target: 5000ms → Achieved: 3800ms ✓ TARGET MET
```

### Per-Function Delta (if instrumentation available)

If `instrumented_files` from context is non-empty, run `test_command` once more AFTER strike to capture per-function timing with the same instrumentation the profiler placed:

```
| Function | Before (ms) | After (ms) | Delta |
|----------|------------|------------|-------|
| mt_translate | 3500 | 450 | -87% (7.8x) |
| tikal_extract | 2800 | 2800 | 0% (unchanged) |
```

Then clean up: `git checkout -- {instrumented_files}` — remove all profiling instrumentation before final commit.

Present both tables to user. This is the primary deliverable — numbers the user sees first.

### Experiment Log

Write to `{project_root}/.hex-skills/optimization/{slug}/ln-814-log.tsv`:

| Column | Description |
|--------|-------------|
| timestamp | ISO 8601 |
| phase | `strike` / `bisect` / `contested` |
| hypotheses | Comma-separated IDs applied in this round |
| baseline_ms | Baseline before this round |
| result_ms | Measurement after changes |
| improvement_pct | Percentage change |
| status | `applied` / `removed` / `alternative_a` / `alternative_b` |
| commit | Git commit hash |
| files | Comma-separated modified files |
| e2e_status | pass / fail / skipped |

Append to existing file if present (enables tracking across multiple runs).

---

## Phase 4: Gap Analysis (If Target Not Met)

If target metric not reached after all hypotheses:

| Section | Content |
|---------|---------|
| Achievement | What was achieved (original → final, improvement %) |
| Remaining bottlenecks | From time map: which steps still dominate |
| Remaining cycles | If coordinator runs multi-cycle: "{remaining} optimization cycles available for remaining bottlenecks" |
| Infrastructure recommendations | If bottleneck requires infra changes (scaling, caching layer, CDN) |
| Further research | Optimization directions not explored in this run |

---

## Error Handling

| Error | Recovery |
|-------|----------|
| Strike fails all tests | Bisect to find breaking hypothesis, remove it, retry |
| Strike shows no improvement | Bisect to identify ineffective hypotheses |
| Measurement inconsistent (high variance) | Increase runs to 10, use median |
| Worktree creation fails | Fall back to branch per git_worktree_fallback.md |
| Time budget exceeded | Stop loop, report partial results with hypotheses remaining |
| Multi-file revert fails | `git checkout -- .` in worktree (safe — worktree is isolated) |

---

## References

- [optimization_categories.md](references/optimization_categories.md) — optimization pattern checklist
- `shared/references/ci_tool_detection.md` (test + benchmark detection)
- `shared/references/git_worktree_fallback.md` (worktree isolation)

---

## Runtime Summary Artifact

**MANDATORY READ:** Load `shared/references/coordinator_summary_contract.md`

Write `.hex-skills/runtime-artifacts/runs/{run_id}/optimization-execution/{slug}.json` before finishing.

## Definition of Done

- [ ] Baseline established using same metric type as observed problem
- [ ] Hypotheses triaged: uncontested vs contested
- [ ] Strike applied: all uncontested hypotheses implemented at once
- [ ] Tests pass after strike
- [ ] Contested alternatives A/B tested on top of full implementation
- [ ] Bisect performed only if strike fails (not preemptively)
- [ ] E2E safety test passes (or documented as unavailable)
- [ ] Experiment log written to `.hex-skills/optimization/{slug}/ln-814-log.tsv`
- [ ] Report returned with baseline, final, improvement%, strike result
- [ ] All changes on isolated branch, pushed to remote
- [ ] Gap analysis provided if target metric not met
- [ ] Optimization execution artifact written to the shared location

---

**Version:** 2.0.0
**Last Updated:** 2026-03-14

Related Skills

ln-820-dependency-optimization-coordinator

310

from levnikolaevich/claude-code-skills

Upgrades dependencies across all detected package managers. Use when updating npm, NuGet, or pip packages project-wide.

ln-813-optimization-plan-validator

310

from levnikolaevich/claude-code-skills

Validates optimization plan via multi-agent review before execution. Use when verifying feasibility of optimization hypotheses.

ln-812-optimization-researcher

310

from levnikolaevich/claude-code-skills

Researches competitive benchmarks and generates optimization hypotheses for identified bottlenecks. Use after profiling.

ln-404-test-executor

310

from levnikolaevich/claude-code-skills

Executes test tasks (label 'tests') through Todo to To Review with risk-based limits. Use for test task execution. Not for implementation tasks.

ln-401-task-executor

310

from levnikolaevich/claude-code-skills

Executes implementation tasks through Todo, In Progress, To Review. Use when task needs coding with KISS/YAGNI. Not for test tasks.

ln-400-story-executor

310

from levnikolaevich/claude-code-skills

Executes Story tasks in priority order (To Review, To Rework, Todo). Use when Story has planned tasks ready for implementation.

ln-914-community-responder

310

from levnikolaevich/claude-code-skills

Responds to unanswered GitHub discussions and issues with codebase-informed replies. Use when clearing community question backlog.

ln-913-community-debater

310

from levnikolaevich/claude-code-skills

Launches RFC and debate discussions on GitHub. Use when proposing changes that need community input or voting.

ln-912-community-announcer

310

from levnikolaevich/claude-code-skills

Composes and publishes announcements to GitHub Discussions. Use when sharing releases, updates, or news with the community.

ln-911-github-triager

310

from levnikolaevich/claude-code-skills

Produces prioritized triage report from open GitHub issues, PRs, and discussions. Use when reviewing community backlog.

ln-910-community-engagement

310

from levnikolaevich/claude-code-skills

Analyzes community health and delegates engagement tasks. Use when managing GitHub issues, discussions, and announcements.

ln-840-benchmark-compare

310

from levnikolaevich/claude-code-skills

Runs built-in vs hex-line benchmark with scenario manifests, activation checks, and diff-based correctness. Use when measuring hex-line MCP performance against built-in tools.