nw-outcome-kpi-framework

Outcome KPI definition methodology - synthesizes Who Does What By How Much (Gothelf/Seiden), Running Lean (Maurya), and Measure What Matters (Doerr) into a practical framework for measurable outcome KPIs

322 stars

bynWave-ai

View on GitHub Installation ↓

Best use case

nw-outcome-kpi-framework is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using nw-outcome-kpi-framework should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/nw-outcome-kpi-framework/SKILL.md --create-dirs "https://raw.githubusercontent.com/nWave-ai/nWave/main/nWave/skills/nw-outcome-kpi-framework/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/nw-outcome-kpi-framework/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How nw-outcome-kpi-framework Compares

Feature / Agent	nw-outcome-kpi-framework	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Outcome KPI Framework

"Doing stuff isn't the point. Achieving stuff is." -- Jeff Gothelf

Defines measurable outcome KPIs for user stories and features. Loaded during Phase 4 (Requirements Crafting) to produce `outcome-kpis.md`. Synthesizes three frameworks: customer-centric OKRs, lean metrics, and OKR methodology.

## The Outcome KPI Formula

Primary template from Gothelf/Seiden. Every KPI answers five questions:

| Component | Question | Example |
|-----------|----------|---------|
| **Who** | Which user segment? | Returning customers with 2+ orders |
| **Does What** | What observable behavior changes? | Complete checkout without contacting support |
| **By How Much** | What is the measurable target? | 40% reduction in support tickets |
| **Measured By** | How do we collect the data? | Support ticket system + checkout analytics |
| **Timeframe** | When do we measure? | 30 days post-release, then weekly |

Formula: **[Who] [Does what] [By how much]**

Apply as litmus test: if a KPI cannot answer all five components, it measures an output (feature delivery), not an outcome (behavior change).

### Good vs Bad KPIs

| Bad (Output) | Good (Outcome) |
|-------------|----------------|
| Launch mobile app v2 | Mobile users complete purchases 40% more often |
| Build recommendation engine | Users purchase from recommendations, increasing from 10% to 25% |
| Deploy onboarding redesign | New users complete onboarding within 24 hours 30% more often |
| Ship CSV export | Analysts resolve data questions without engineering support 60% of the time |

## Leading vs Lagging Indicators

From Gothelf/Seiden: business results are lagging -- teams cannot directly influence them. Target leading indicators instead.

| Type | Definition | Examples | Actionable? |
|------|-----------|----------|-------------|
| **Lagging** (Impact) | Business results already happened | Revenue, NPS, market share, churn rate | No -- too slow, too many variables |
| **Leading** (Outcome) | Behavior changes predicting business results | Purchase completion rate, feature adoption, retention | Yes -- teams can run experiments |
| **Leading** (Secondary) | Behaviors predicting primary leading indicators | Page visits, trial starts, onboarding steps completed | Yes -- most granular, fastest signal |

### Outcome Mapping Chain

Map every KPI through this chain to ensure traceability:

```
Business KPI (Lagging/Impact)
    Example: "Increase quarterly revenue by 15%"
        |
        v
    Customer Behavior (Leading/Outcome)
        +-- Users complete purchases from recommendations (+25%)
        +-- Users return within 7 days (+20%)
        |
        v
    Secondary Behavior (Leading/Secondary)
        +-- Users browse recommendation pages (+30%)
        +-- Users enable push notifications (+15%)
```

Each layer decomposes into more granular behavioral metrics. Teams target the highest-leverage behavior.

## Actionable vs Vanity Metrics

From Maurya (Running Lean): actionable metrics "tie specific and repeatable actions to observed results."

| Dimension | Vanity | Actionable |
|-----------|--------|------------|
| Measures | Business size (totals) | Individual behavior (rates) |
| Data type | Gross aggregates | Ratios and unit economics |
| Cause/effect | No insight into why | Directly signal product-market fit |
| Examples | Total users, page views, downloads | Activation rate, retention cohort, churn rate |
| Decision value | Cannot inform action | Drives specific experiments |

### The OMTM (One Metric That Matters)

Pick ONE metric per product stage. Optimizing one metric reveals the next.

| Stage | Focus | Example OMTM |
|-------|-------|--------------|
| Empathy | Problem validation | Interview pain intensity (qualitative) |
| Stickiness | Retention | Churn rate, DAU/MAU ratio |
| Virality | Organic growth | Viral coefficient, referral rate |
| Revenue | Monetization | Customer Lifetime Value, MRR |
| Scale | Growth efficiency | CAC/LTV ratio, payback period |

**Good metric characteristics**: rate or ratio (not absolute number) | comparable across time | simple enough to remember | predictive | behavior-changing.

### Customer Factory (AARRR) Constraint Mapping

From Maurya: model the business as a production line. Identify the bottleneck, then focus KPIs there.

| Stage | Key Question | Example Metric |
|-------|-------------|----------------|
| **Acquisition** | Are we reaching the right people? | Visitor-to-signup conversion rate |
| **Activation** | Do users get the "aha moment"? | % completing core action in first session |
| **Retention** | Do users come back? | Week-1 return rate, DAU/MAU |
| **Revenue** | Do users pay? | Trial-to-paid conversion rate |
| **Referral** | Do users tell others? | Referral rate, viral coefficient |

Activation is causal -- it drives retention, revenue, and referral. Prioritize activation KPIs when uncertain.

## OKR Integration

From Doerr (Measure What Matters): connect KPIs to strategic objectives.

### Writing Key Results

Every Key Result uses the outcome formula. Quality criteria:

1. **Measurable**: "It's not a Key Result unless it has a number" (Marissa Mayer)
2. **Outcome-focused**: "Increase email subscribers by 20%" not "Launch newsletter"
3. **Time-bound**: deadline (typically end of quarter)
4. **Verifiable**: no ambiguity about whether met
5. **Aggressive yet realistic**: stretch without demoralizing

### Committed vs Aspirational

| Type | Expected Score | Resource Allocation | Failure Response |
|------|---------------|--------------------|-----------------|
| **Committed** | 1.0 (must deliver) | Consume most available resources | Requires explanation, replanning |
| **Aspirational** | 0.7 (stretch goal) | Overcommit slightly beyond capacity | Expected -- carry forward |

Sweet spot: blended aggregate of 0.6-0.7. Consistently hitting 1.0 = not ambitious enough.

### OKR Anti-Patterns

| Anti-Pattern | Signal | Fix |
|-------------|--------|-----|
| Output-based KRs | "Launch X", "Build Y", "Ship Z" | Rewrite as behavior: "Users [do what] [by how much]" |
| Too many KRs | >5 KRs per Objective | Cut to 2-4 per Objective, max 3-5 Objectives |
| Vague KRs | No numeric target | Add baseline + target + deadline |
| Sandbagging | Consistently scoring 1.0 | Increase ambition level |
| Backlog retrofitting | OKRs match existing backlog 1:1 | OKRs filter backlog, not justify it |

### Mapping: Objective to User Stories

```
Objective (qualitative, inspirational, timeboxed)
    |
    Key Results (2-4 per Objective, [Who][Does what][By how much])
    |
    Epics (weeks of work, aligned to Key Results)
    |
    User Stories (days of work, with measurable acceptance criteria)
```

Every story traces back to a Key Result. Orphan stories (no KR link) are potential waste.

## KPI Template

Use this exact structure in `outcome-kpis.md`:

```markdown
## Feature: {feature-name}

### Objective
{What success looks like in one sentence -- qualitative, inspirational, timeboxed}

### Outcome KPIs

| # | Who | Does What | By How Much | Baseline | Measured By | Type |
|---|-----|-----------|-------------|----------|-------------|------|
| 1 | {segment} | {behavior} | {target} | {current} | {method} | Leading/Lagging |

### Metric Hierarchy
- **North Star**: {the ONE metric that matters most for this feature}
- **Leading Indicators**: {behaviors that predict the north star}
- **Guardrail Metrics**: {metrics that must NOT degrade}

### Measurement Plan
| KPI | Data Source | Collection Method | Frequency | Owner |
|-----|------------|-------------------|-----------|-------|

### Hypothesis
We believe that {proposed solution} for {user segment} will achieve {key result}.
We will know this is true when {who} {does what} {by how much}.
```

## KPI Granularity

- **Per Epic**: Define 2-3 north-star KPIs that all contributing stories aim to move
- **Per Story**: Add story-level success criteria tied to the epic-level KPIs
- **Guardrails**: Define at epic level, apply consistently across all stories
- **Rule of thumb**: If the feature has 1-3 stories, one KPI table suffices. If 4+, group by epic.

## Smell Tests

Before finalizing KPIs, verify each one passes:

| Check | Question | If No |
|-------|----------|-------|
| Measurable today? | Can you measure it with current instrumentation? | Add instrumentation to requirements |
| Rate not total? | Is it a ratio/rate, not a gross count? | Convert to rate (vanity -> actionable) |
| Outcome not output? | Does it describe user behavior, not feature delivery? | Rewrite as "[Who] [Does what] [By how much]" |
| Has baseline? | Do you know the current value? | Establish baseline before setting target |
| Team can influence? | Can the team directly affect this metric? | Decompose into more granular leading indicator |
| Has guardrails? | Are there metrics that must not degrade? | Add guardrail metrics (e.g., error rate, load time) |

## Handoff to DEVOPS

The platform-architect needs these from `outcome-kpis.md` to plan instrumentation:

1. **Data collection requirements**: what events/behaviors to instrument, what data points to capture
2. **Dashboard/monitoring needs**: which metrics need real-time dashboards vs. weekly reports
3. **Alerting thresholds**: guardrail metric boundaries that trigger alerts when breached
4. **Baseline measurement**: any metrics needing baseline collection before feature release

## References

For deeper reading on source frameworks:
- Running Lean (Maurya): `docs/research/running-lean-research.md`
- Measure What Matters (Doerr): `docs/research/measure-what-matters-research.md`
- Who Does What By How Much (Gothelf/Seiden): `docs/research/who-does-what-research.md`

Related Skills

nw-sd-framework

322

from nWave-ai/nWave

4-step system design framework with back-of-envelope estimation, scaling ladder, and common pitfalls

nw-quality-framework

322

from nWave-ai/nWave

Quality gates - 11 commit readiness gates, build/test protocol, validation checkpoints, and quality metrics

nw-post-mortem-framework

322

from nWave-ai/nWave

Blameless post-mortem structure, incident timeline reconstruction, response evaluation, and organizational learning

nw-divio-framework

322

from nWave-ai/nWave

DIVIO/Diataxis four-quadrant documentation framework - type definitions, classification decision tree, and signal catalog

nw-ux-web-patterns

322

from nWave-ai/nWave

Web UI design patterns for product owners. Load when designing web application interfaces, writing web-specific acceptance criteria, or evaluating responsive designs.

nw-ux-tui-patterns

322

from nWave-ai/nWave

Terminal UI and CLI design patterns for product owners. Load when designing command-line tools, interactive terminal applications, or writing CLI-specific acceptance criteria.

nw-ux-principles

322

from nWave-ai/nWave

Core UX principles for product owners. Load when evaluating interface designs, writing acceptance criteria with UX requirements, or reviewing wireframes and mockups.

nw-ux-emotional-design

322

from nWave-ai/nWave

Emotional design and delight patterns for product owners. Load when designing onboarding flows, empty states, first-run experiences, or evaluating the emotional quality of an interface.

nw-ux-desktop-patterns

322

from nWave-ai/nWave

Desktop application UI patterns for product owners. Load when designing native or cross-platform desktop applications, writing desktop-specific acceptance criteria, or evaluating panel layouts and keyboard workflows.

nw-user-story-mapping

322

from nWave-ai/nWave

User story mapping for backlog management and outcome-based prioritization. Load during Phase 2.5 (User Story Mapping) to produce story-map.md and prioritization.md.

nw-tr-review-criteria

322

from nWave-ai/nWave

Review dimensions and scoring for root cause analysis quality assessment

nw-tlaplus-verification

322

from nWave-ai/nWave

TLA+ formal verification for design correctness and PBT pipeline integration