scenario

Author and manage holdout scenarios for behavioral validation. Scenarios are stored in .agents/holdout/ where implementing agents cannot see them. Triggers: "$scenario", "holdout", "behavioral scenario", "create scenario", "list scenarios".

244 stars

Best use case

scenario is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Author and manage holdout scenarios for behavioral validation. Scenarios are stored in .agents/holdout/ where implementing agents cannot see them. Triggers: "$scenario", "holdout", "behavioral scenario", "create scenario", "list scenarios".

Teams using scenario should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/scenario/SKILL.md --create-dirs "https://raw.githubusercontent.com/boshu2/agentops/main/skills-codex/scenario/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/scenario/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How scenario Compares

Feature / AgentscenarioStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Author and manage holdout scenarios for behavioral validation. Scenarios are stored in .agents/holdout/ where implementing agents cannot see them. Triggers: "$scenario", "holdout", "behavioral scenario", "create scenario", "list scenarios".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Scenario Skill

Author and manage holdout scenarios for Stage 4 behavioral validation.

Scenarios are **holdout** — implementing agents cannot see them (enforced by hook).
Evaluator agents validate code against scenarios during STEP 1.8 in `$validation`.

## Execution Steps

### Step 1: Initialize

```bash
ao scenario init   # Creates .agents/holdout/ with README
```

### Step 2: Author Scenario

Write a scenario JSON to `.agents/holdout/<id>.json` following `schemas/scenario.v1.schema.json`:

```json
{
    "id": "s-2026-04-05-001",
    "version": 1,
    "date": "2026-04-05",
    "goal": "User can authenticate with valid credentials",
    "narrative": "A user visits login, enters valid credentials, expects dashboard redirect.",
    "expected_outcome": "Dashboard loads, session cookie is HttpOnly and Secure.",
    "acceptance_vectors": [
        {"dimension": "correctness", "threshold": 0.9, "check": "grep -q 'HttpOnly' headers"},
        {"dimension": "performance", "threshold": 0.7}
    ],
    "satisfaction_threshold": 0.8,
    "scope": {
        "files": ["src/auth/middleware.go"],
        "functions": ["Authenticate"],
        "behaviors": ["login flow"]
    },
    "source": "human",
    "status": "active"
}
```

### Step 3: Validate

```bash
ao scenario validate   # Checks all scenarios against schema
```

### Step 4: List

```bash
ao scenario list                  # All scenarios
ao scenario list --status active  # Active only
```

## Key Rules

- Scenarios use **satisfaction scoring** (0.0-1.0), not boolean pass/fail
- Scenarios should be written by humans or evaluator agents, NEVER by the implementing agent
- `source` field tracks provenance: `human`, `agent`, `prod-telemetry`
- Agent-built specs (from `$implement` Step 5c) use `auto-*` id prefix and live in `.agents/specs/`

## See Also

- `$validation` — STEP 1.8 consumes scenarios
- `$vibe` — Exposes satisfaction_score
- `$implement` — Step 5c generates agent-built specs

Related Skills

vibe

244
from boshu2/agentops

Comprehensive code validation. Runs complexity analysis then multi-model council. Answer: Is this code ready to ship? Triggers: "vibe", "validate code", "check code", "review code", "code quality", "is this ready".

validation

244
from boshu2/agentops

Full validation phase orchestrator. Vibe + post-mortem + retro + forge. Reviews implementation quality, extracts learnings, feeds the knowledge flywheel. Triggers: "validation", "validate", "validate work", "review and learn", "validation phase", "post-implementation review".

update

244
from boshu2/agentops

Reinstall all AgentOps skills globally from the latest source. Triggers: "update skills", "reinstall skills", "sync skills".

trace

244
from boshu2/agentops

Trace design decisions and concepts through session history, handoffs, and git. Triggers: "trace decision", "how did we decide", "where did this come from", "design provenance", "decision history".

test

244
from boshu2/agentops

Test generation, coverage analysis, and TDD workflow. Triggers: "test", "generate tests", "test coverage", "write tests", "tdd", "add tests", "test strategy", "missing tests", "coverage gaps".

status

244
from boshu2/agentops

Single-screen dashboard showing current work, recent validations, flywheel health, and suggested next action. Triggers: "status", "dashboard", "what am I working on", "where was I".

standards

244
from boshu2/agentops

Language-specific coding standards and validation rules. Provides Python, Go, Rust, TypeScript, Shell, YAML, JSON, and Markdown standards. Auto-loaded by $vibe, $implement, $doc, $bug-hunt, $complexity based on file types.

shared

244
from boshu2/agentops

Shared reference documents for multi-agent skills (not directly invocable)

security

244
from boshu2/agentops

Continuous repository security scanning and release gating. Triggers: "security scan", "security audit", "pre-release security", "run scanners", "check vulnerabilities".

security-suite

244
from boshu2/agentops

Composable security suite for binary and prompt-surface assurance, static analysis, dynamic tracing, repo-native redteam scans, contract capture, baseline drift, and policy gating. Triggers: "binary security", "reverse engineer binary", "black-box binary test", "behavioral trace", "baseline diff", "prompt redteam", "security suite".

scaffold

244
from boshu2/agentops

Project scaffolding, component generation, and boilerplate setup. Triggers: "scaffold", "new project", "init project", "create project", "generate component", "setup project", "starter", "boilerplate".

rpi

244
from boshu2/agentops

Full RPI lifecycle orchestrator. Delegates to $discovery, $crank, $validation phase skills. One command, full lifecycle with complexity classification, --from routing, and optional loop. Triggers: "rpi", "full lifecycle", "research plan implement", "end to end".