ln-636-manual-test-auditor

Checks manual test scripts for harness adoption, golden files, fail-fast, config sourcing, idempotency. Use when auditing manual test quality.

310 stars

Best use case

ln-636-manual-test-auditor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Checks manual test scripts for harness adoption, golden files, fail-fast, config sourcing, idempotency. Use when auditing manual test quality.

Teams using ln-636-manual-test-auditor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ln-636-manual-test-auditor/SKILL.md --create-dirs "https://raw.githubusercontent.com/levnikolaevich/claude-code-skills/main/skills-catalog/ln-636-manual-test-auditor/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/ln-636-manual-test-auditor/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How ln-636-manual-test-auditor Compares

Feature / Agentln-636-manual-test-auditorStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Checks manual test scripts for harness adoption, golden files, fail-fast, config sourcing, idempotency. Use when auditing manual test quality.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

> **Paths:** File paths (`shared/`, `references/`, `../ln-*`) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root. If `shared/` is missing, fetch files via WebFetch from `https://raw.githubusercontent.com/levnikolaevich/claude-code-skills/master/skills/{path}`.

# Manual Test Quality Auditor (L3 Worker)

**Type:** L3 Worker

Specialized worker auditing manual test scripts for quality and best-practice compliance.

## Purpose & Scope

- **Worker in ln-630 coordinator pipeline**
- Audit **Manual Test Quality** (Category 7: Medium Priority)
- Evaluate bash test scripts in `tests/manual/` against quality dimensions
- Calculate compliance score (X/10)

## Inputs (from Coordinator)

**MANDATORY READ:** Load `shared/references/audit_worker_core_contract.md`.

Receives `contextStore` with: `tech_stack`, `testFilesMetadata` (filtered to `type: "manual"`), `codebase_root`, `output_dir`.

Manual test metadata includes: `suite_dir`, `has_expected_dir`, `harness_sourced`.

## Workflow

**MANDATORY READ:** Load `shared/references/two_layer_detection.md` for detection methodology.

1) **Parse Context:** Extract manual test file list, output_dir, codebase_root from contextStore
2) **Discover Infrastructure:** Detect shared infrastructure files:
   - `tests/manual/config.sh` -- shared configuration
   - `tests/manual/test_harness.sh` -- shared test framework (if exists)
   - `tests/manual/test-all.sh` -- master runner
   - `tests/manual/TEMPLATE-*.sh` -- test templates (if exist)
   - `tests/manual/regenerate-golden.sh` -- golden file regeneration (if exists)
3) **Scan Scripts (Layer 1):** For each manual test script, check 7 quality dimensions (see Audit Rules)
3b) **Context Analysis (Layer 2 -- MANDATORY):** For each candidate finding, ask:
   - Is this a setup/utility script (e.g., `00-setup/*.sh`, `tools/*.sh`)? Setup scripts have different requirements -- skip harness/golden checks
   - Is this a master runner (`test-all.sh`)? Master runners orchestrate, not test -- skip all checks except fail-fast
   - Does the project not use a shared harness at all? If no `test_harness.sh` exists, harness adoption check is N/A
4) **Collect Findings:** Record violations with severity, location (file:line), effort, recommendation
5) **Calculate Score:** Count violations by severity, calculate compliance score (X/10)
6) **Write Report:** Build full markdown report in memory per `shared/templates/audit_worker_report_template.md`, write to `{output_dir}/636-manual-test-quality.md` in single Write call
7) **Return Summary:** Return minimal summary to coordinator (see Output Format)

## Audit Rules

### 1. Harness Adoption

**What:** Test script uses shared framework (`run_test`, `init_test_state`) instead of custom assertion logic

**Detection:**
- Grep for `run_test`, `init_test_state` in script
- If absent AND script contains custom test loops/assertions -> custom logic
- If `test_harness.sh` does not exist in project -> skip this check entirely

**Severity:** **HIGH** (custom logic = maintenance burden, inconsistent reporting)

**Recommendation:** Refactor to use shared `run_test` from test_harness.sh

**Effort:** M

### 2. Golden File Completeness

**What:** Test suite has `expected/` directory with reference files matching test scenarios

**Detection:**
- Check if suite directory has `expected/` subdirectory
- Compare: number of test scenarios (grep `run_test` calls) vs number of expected files
- If test uses `diff` against expected files but expected dir is missing -> finding

**Layer 2:** Not all tests need golden files. Tests validating HTTP status codes, timing, or dynamic data may legitimately skip golden comparison -> skip if test has no `diff` or comparison against files

**Severity:** **HIGH** (no golden files = no regression detection for output correctness)

**Recommendation:** Add expected/ directory with reference output files

**Effort:** M

### 3. Config Sourcing

**What:** Script sources shared `config.sh` for consistent configuration

**Detection:**
- Grep for `source.*config.sh` or `. .*config.sh`
- If absent -> script manages its own BASE_URL, tokens, etc.

**Layer 2:** If script is self-contained utility (e.g., `tools/*.sh`) -> skip

**Severity:** **MEDIUM**

**Recommendation:** Add `source "$THIS_DIR/../config.sh"` for shared configuration

**Effort:** S

### 4. Fail-Fast Compliance

**What:** Script uses `set -e` and returns exit code 1 on failure

**Detection:**
- Grep for `set -e` (or `set -eo pipefail`)
- Check that failure paths lead to non-zero exit (not swallowed by `|| true` everywhere)

**Severity:** **HIGH** (silent failures mask broken tests)

**Recommendation:** Add `set -e` at script start, ensure test failures propagate

**Effort:** S

### 5. Template Compliance

**What:** Script follows project test templates (TEMPLATE-api-endpoint.sh, TEMPLATE-document-format.sh)

**Detection:**
- If TEMPLATE files exist in `tests/manual/`, check structural alignment:
  - Header comment block with description, ACs tested, prerequisites
  - Standard variable naming (`THIS_DIR`, `EXPECTED_DIR`)
  - Standard setup pattern (`source config.sh`, `check_jq`, `setup_auth`)
- If NO templates exist in project -> skip this check entirely

**Layer 2:** Older scripts written before templates may diverge. Flag as MEDIUM, not HIGH

**Severity:** **MEDIUM**

**Recommendation:** Align script structure with project TEMPLATE files

**Effort:** M

### 6. Idempotency

**What:** Script can be rerun safely without side effects from previous runs

**Detection:**
- Grep for cleanup patterns: `trap.*EXIT`, `rm -f`, `cleanup` functions
- Check for temp file creation without cleanup
- Check for hardcoded resource names that would conflict on rerun (e.g., creating user with fixed email without checking existence)

**Layer 2:** Scripts that only READ data (GET requests, queries) are inherently idempotent -> skip

**Severity:** **MEDIUM**

**Recommendation:** Add cleanup trap or use unique identifiers per run

**Effort:** S-M

### 7. Documentation

**What:** Test suite directory has README.md explaining purpose and prerequisites

**Detection:**
- Check if suite directory (`NN-feature/`) contains README.md
- If missing -> finding

**Layer 2:** Setup directories (`00-setup/`) and utility directories (`tools/`) may not need README -> skip

**Severity:** **LOW**

**Recommendation:** Add README.md with test purpose, prerequisites, usage

**Effort:** S

## Scoring Algorithm

**MANDATORY READ:** Load `shared/references/audit_worker_core_contract.md` and `shared/references/audit_scoring.md`.

**Severity mapping:**
- Missing harness adoption (when harness exists), No golden files (when expected-based), No fail-fast -> HIGH
- Missing config sourcing, Template divergence, No idempotency -> MEDIUM
- Missing README -> LOW

## Output Format

**MANDATORY READ:** Load `shared/references/audit_worker_core_contract.md` and `shared/templates/audit_worker_report_template.md`.

If summaryArtifactPath is present, write JSON summary per shared/references/audit_summary_contract.md. Compact text output is fallback only.

Write report to `{output_dir}/636-manual-test-quality.md` with `category: "Manual Test Quality"` and checks: harness_adoption, golden_file_completeness, config_sourcing, fail_fast_compliance, template_compliance, idempotency, documentation.

Return summary per `shared/references/audit_summary_contract.md`.

Legacy compact text output is allowed only when `summaryArtifactPath` is absent:
```
Report written: .hex-skills/runtime-artifacts/runs/{run_id}/audit-report/636-manual-test-quality.md
Score: X.X/10 | Issues: N (C:N H:N M:N L:N)
```

## Critical Rules

**MANDATORY READ:** Load `shared/references/audit_worker_core_contract.md`.

- **Do not auto-fix:** Report only
- **Effort realism:** S = <1h, M = 1-4h, L = >4h
- **Skip when empty:** If no `tests/manual/` directory exists, return score 10/10 with zero findings
- **Exclude non-test files:** Skip `config.sh`, `test_harness.sh`, `test-all.sh`, `regenerate-golden.sh`, `TEMPLATE-*.sh`, files in `tools/`, `results/`, `test-runs/`
- **Context-aware:** Setup scripts (`00-setup/`) have relaxed requirements (no golden files, no harness needed)

## Definition of Done

**MANDATORY READ:** Load `shared/references/audit_worker_core_contract.md`.

- [ ] contextStore parsed successfully (including output_dir)
- [ ] Manual test infrastructure discovered (config.sh, harness, templates)
- [ ] All 7 checks completed per test script
- [ ] Layer 2 context analysis applied (setup/utility exclusions)
- [ ] Findings collected with severity, location, effort, recommendation
- [ ] Score calculated using penalty algorithm
- [ ] Report written to `{output_dir}/636-manual-test-quality.md` (atomic single Write call)
- [ ] Summary written per contract

## Reference Files

- **Audit output schema:** `shared/references/audit_output_schema.md`

---
**Version:** 1.0.0
**Last Updated:** 2026-03-13

Related Skills

ln-782-test-runner

310
from levnikolaevich/claude-code-skills

Executes all test suites and reports results with coverage. Use when verifying that test infrastructure works after bootstrap.

ln-743-test-infrastructure

310
from levnikolaevich/claude-code-skills

Sets up test infrastructure with Vitest, xUnit, and pytest. Use when adding testing frameworks and sample tests to a project.

ln-654-resource-lifecycle-auditor

310
from levnikolaevich/claude-code-skills

Checks session scope mismatch, missing cleanup, pool config, error path leaks, resource holding. Use when auditing resource lifecycle.

ln-653-runtime-performance-auditor

310
from levnikolaevich/claude-code-skills

Checks blocking IO in async, unnecessary allocations, sync sleep, string concat in loops, redundant copies. Use when auditing runtime performance.

ln-652-transaction-correctness-auditor

310
from levnikolaevich/claude-code-skills

Checks transaction scope, missing rollback handling, long-held transactions, trigger/notify interaction. Use when auditing transaction correctness.

ln-651-query-efficiency-auditor

310
from levnikolaevich/claude-code-skills

Checks redundant fetches, N+1 loops, over-fetching, missing bulk operations, wrong caching scope. Use when auditing query efficiency.

ln-650-persistence-performance-auditor

310
from levnikolaevich/claude-code-skills

Coordinates persistence and performance audit across queries, transactions, runtime, and resource lifecycle. Use when auditing data layer performance.

ln-647-env-config-auditor

310
from levnikolaevich/claude-code-skills

Checks env var config sync, missing defaults, naming conventions, startup validation. Use when auditing environment configuration.

ln-646-project-structure-auditor

310
from levnikolaevich/claude-code-skills

Checks file hygiene, ignore files, framework conventions, domain/layer organization, naming. Use when auditing project structure.

ln-644-dependency-graph-auditor

310
from levnikolaevich/claude-code-skills

Builds dependency graph, detects cycles, validates boundary rules, calculates coupling metrics (Ca/Ce/I). Use when auditing dependency structure.

ln-643-api-contract-auditor

310
from levnikolaevich/claude-code-skills

Checks layer leakage in method signatures, missing DTOs, entity leakage to API, inconsistent error contracts. Use when auditing API contracts.

ln-642-layer-boundary-auditor

310
from levnikolaevich/claude-code-skills

Checks layer boundary violations, transaction boundaries, session ownership, cross-layer consistency. Use when auditing architecture layers.