ln-404-test-executor

Executes test tasks (label 'tests') through Todo to To Review with risk-based limits. Use for test task execution. Not for implementation tasks.

310 stars

bylevnikolaevich

View on GitHub Installation ↓

Best use case

ln-404-test-executor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Executes test tasks (label 'tests') through Todo to To Review with risk-based limits. Use for test task execution. Not for implementation tasks.

Teams using ln-404-test-executor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ln-404-test-executor/SKILL.md --create-dirs "https://raw.githubusercontent.com/levnikolaevich/claude-code-skills/main/skills-catalog/ln-404-test-executor/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ln-404-test-executor/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ln-404-test-executor Compares

Feature / Agent	ln-404-test-executor	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Executes test tasks (label 'tests') through Todo to To Review with risk-based limits. Use for test task execution. Not for implementation tasks.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

SKILL.md Source

> **Paths:** File paths (`shared/`, `references/`, `../ln-*`) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root. If `shared/` is missing, fetch files via WebFetch from `https://raw.githubusercontent.com/levnikolaevich/claude-code-skills/master/skills/{path}`.

# Test Task Executor

**Type:** L3 Worker

Runs a single Story final test task (label "tests") through implementation/execution to To Review.

## Purpose & Scope
- Handle only tasks labeled "tests"; other tasks go to ln-401.
- Follow the 11-section test task plan (E2E/Integration/Unit, infra/docs/cleanup).
- Enforce risk-based constraints: Priority ≥15 scenarios covered; each test passes Usefulness Criteria; no framework/DB/library/performance tests.
- Update Linear/kanban for this task only: Todo -> In Progress -> To Review.

**Hex-line acceleration (if available):** Use `outline(path)` before reading test targets. Use `inspect_path(path="tests/")` to understand test structure.
Use `read_file()` and `edit_file()` as the primary path for test/code/config files. Use `verify()` and `changes()` before handoff. Built-in Read/Edit are fallback only when hex-line is unavailable.

## Inputs

| Input | Required | Source | Description |
|-------|----------|--------|-------------|
| `taskId` | Yes | args, parent Story, kanban, user | Test task to execute |

**Resolution:** Task Resolution Chain.
**Status filter:** Todo (label: tests)

## Task Storage Mode

**MANDATORY READ:** Load `shared/references/tools_config_guide.md`, `shared/references/storage_mode_detection.md`, and `shared/references/input_resolution_pattern.md`

Extract: `task_provider` = Task Management → Provider (`linear` | `file`).

| Aspect | Linear Mode | File Mode |
|--------|-------------|-----------|
| **Load task** | `get_issue(task_id)` | `Read("docs/tasks/epics/.../tasks/T{NNN}-*.md")` |
| **Load Story** | `get_issue(parent_id)` | `Read("docs/tasks/epics/.../story.md")` |
| **Update status** | `save_issue(id, state)` | `Edit` the `**Status:**` line in file |
| **Test results** | `create_comment({issueId, body})` | `Write` comment to `.../comments/{ISO-timestamp}.md` |

**File Mode transitions:** Todo → In Progress → To Review

**MANDATORY READ:** Load `shared/references/mcp_tool_preferences.md` — ALWAYS use hex-line MCP for code files when available. No fallback to standard Read/Edit unless hex-line is down.

## Workflow (concise)
1) **Resolve taskId:** Run Task Resolution Chain per guide (status filter: [Todo, label: tests]).
2) **Load task:** Fetch full test task description (Linear: get_issue; File: Read task file); read linked guides/manuals/ADRs/research; review parent Story and manual test results if provided.
2b) **Goal gate:** **MANDATORY READ:** `shared/references/goal_articulation_gate.md` — State REAL GOAL of these tests (which business behavior must be verified, not "write tests"). NOT THE GOAL: testing infrastructure or framework behavior instead of business logic. HIDDEN CONSTRAINT: which existing tests might break from implementation changes.
3) **Read environment docs:** **Read `docs/project/infrastructure.md`** — get server IPs, ports, service endpoints. **Read `docs/project/runbook.md`** — understand test environment setup, Docker commands, test execution prerequisites. Use exact commands from runbook.
4) **Validate plan:** Check Priority ≥15 coverage and Usefulness Criteria; ensure focus on business flows (no infra-only tests).
5) **Start work:** Set task In Progress (Linear: update_issue; File: Edit status line); move in kanban.
6) **Implement & run:** **MANDATORY READ:** `shared/references/code_efficiency_criterion.md` — Author/update tests per plan; reuse existing fixtures/helpers; run tests; fix failing existing tests; update infra/doc sections as required. Before handoff, verify 3 efficiency self-checks (especially: reuse fixtures instead of duplicating setup).
7) **Complete:** Ensure counts/priority still within limits; set task To Review; move in kanban; add comment summarizing coverage, commands run, and any deviations.

## Critical Rules
- Single-task only; no bulk updates.
- Do not mark Done; the reviewer approves. Task must end in To Review.
- Keep language (EN/RU) consistent with task.
- No framework/library/DB/performance/load tests; focus on business logic correctness (not infrastructure throughput).
- Respect limits and priority; if violated, stop and return with findings.
- **Do NOT commit.** Leave all changes uncommitted — the reviewer reviews and commits.

## Runtime Summary Artifact

**MANDATORY READ:** Load `shared/references/coordinator_summary_contract.md`

Write `.hex-skills/runtime-artifacts/runs/{run_id}/task-status/{task_id}.json` before finishing.

## Definition of Done
- [ ] Task identified as test task and set to In Progress; kanban updated
- [ ] Plan validated (priority/limits) and guides read
- [ ] Tests implemented/updated and executed; existing failures fixed
- [ ] Docs/infra updates applied per task plan
- [ ] Task set to To Review; kanban moved; summary comment added with commands and coverage
- [ ] Runtime summary artifact written to the shared task-status location.

## Test Failure Analysis Protocol

**CRITICAL:** When a **newly written test** fails, STOP and analyze BEFORE changing anything (failing new tests often indicate implementation bugs, not test issues — fixing blindly masks root cause).

**Step 1: Verify Test Correctness**
- Does test match AC requirements exactly? (Given/When/Then from Story)
- Is expected value correct per business logic?
- If uncertain: Query `ref_search_documentation(query="[domain] expected behavior")`

**Step 2: Decision**
| Test matches AC? | Action |
|------------------|--------|
| YES | **BUG IN CODE** → Fix implementation, not test |
| NO | Test is wrong → Fix test assertion |
| UNCERTAIN | **MANDATORY:** Query MCP Ref + ask user before changing |

**Step 3: Document in Linear comment**
"Test [name] failed. Analysis: [test correct / test wrong]. Action: [fixed code / fixed test]. Reason: [justification]"

**RED FLAGS (require user confirmation):**
- ⚠️ Changing assertion to match actual output ("make test green")
- ⚠️ Removing test case that "doesn't work"
- ⚠️ Weakening expectations (e.g., `toContain` instead of `toEqual`)

**GREEN LIGHTS (safe to proceed):**
- ✅ Fixing typo in test setup/mock data
- ✅ Fixing code to match AC requirements
- ✅ Adding missing test setup step

## Test Writing Principles

### 1. Strict Assertions - Fail on Any Mismatch

**Use exact match assertions by default:**

| Strict (PREFER) | Loose (AVOID unless justified) |
|-----------------|--------------------------------|
| Exact equality check | Partial/substring match |
| Exact length check | "Has any length" check |
| Full object comparison | Partial object match |
| Exact type check | Truthy/falsy check |

**WARN-level assertions FORBIDDEN** - test either PASS or FAIL, no warnings.

### 2. Expected-Based Testing for Deterministic Output

**For deterministic responses (API, transformations):**
- Use **snapshot/golden file testing** for complex deterministic output
- Compare actual output vs expected reference file
- Normalize dynamic data before comparison (timestamps → fixed, UUIDs → placeholder)

### 3. Golden Rule

> "If you know the expected value, assert the exact value."

**Forbidden:** Using loose assertions to "make test pass" when exact value is known.

## Reference Files
- **Tools config:** `shared/references/tools_config_guide.md`
- **Storage mode operations:** `shared/references/storage_mode_detection.md`
- Kanban format: `docs/tasks/kanban_board.md`
- **MANDATORY READ:** `shared/references/research_tool_fallback.md`

---
**Version:** 3.2.0
**Last Updated:** 2026-01-15

Related Skills

ln-814-optimization-executor

310

from levnikolaevich/claude-code-skills

Executes optimization hypotheses with keep/discard testing loop. Use when applying validated performance improvements.

ln-782-test-runner

310

from levnikolaevich/claude-code-skills

Executes all test suites and reports results with coverage. Use when verifying that test infrastructure works after bootstrap.

ln-743-test-infrastructure

310

from levnikolaevich/claude-code-skills

Sets up test infrastructure with Vitest, xUnit, and pytest. Use when adding testing frameworks and sample tests to a project.

ln-637-test-structure-auditor

310

from levnikolaevich/claude-code-skills

Checks test file organization, directory layout, test-to-source mapping, domain grouping, co-location. Use when auditing test structure.

ln-636-manual-test-auditor

310

from levnikolaevich/claude-code-skills

Checks manual test scripts for harness adoption, golden files, fail-fast, config sourcing, idempotency. Use when auditing manual test quality.

ln-635-test-isolation-auditor

310

from levnikolaevich/claude-code-skills

Checks test isolation (API/DB/FS/Time/Network), determinism, flaky tests, order-dependency, anti-patterns. Use when auditing test isolation.

ln-634-test-coverage-auditor

310

from levnikolaevich/claude-code-skills

Identifies missing tests for critical paths (money, security, data integrity, core flows). Use when auditing test coverage gaps.

ln-633-test-value-auditor

310

from levnikolaevich/claude-code-skills

Scores each test by Impact x Probability, returns KEEP/REVIEW/REMOVE decisions. Use when auditing test value and pruning low-value tests.

ln-632-test-e2e-priority-auditor

310

from levnikolaevich/claude-code-skills

Validates E2E coverage for critical paths (money, security, data integrity). Risk-based prioritization. Use when auditing E2E test coverage.

ln-631-test-business-logic-auditor

310

from levnikolaevich/claude-code-skills

Detects tests validating framework/library behavior instead of project code. Use when auditing test business logic focus.

ln-630-test-auditor

310

from levnikolaevich/claude-code-skills

Coordinates test suite audit across business logic, E2E coverage, value, isolation, manual quality, and structure. Use when auditing entire test suite.

ln-523-auto-test-planner

310

from levnikolaevich/claude-code-skills

Plans automated tests (E2E/Integration/Unit) using Risk-Based Testing after manual testing. Use when Story needs a test task with prioritized scenarios.