ln-636-manual-test-auditor

Checks manual test scripts for harness adoption, golden files, fail-fast, config sourcing, idempotency. Use when auditing manual test quality.

310 stars

bylevnikolaevich

View on GitHub Installation ↓

Best use case

ln-636-manual-test-auditor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Checks manual test scripts for harness adoption, golden files, fail-fast, config sourcing, idempotency. Use when auditing manual test quality.

Teams using ln-636-manual-test-auditor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ln-636-manual-test-auditor/SKILL.md --create-dirs "https://raw.githubusercontent.com/levnikolaevich/claude-code-skills/main/skills-catalog/ln-636-manual-test-auditor/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ln-636-manual-test-auditor/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ln-636-manual-test-auditor Compares

Feature / Agent	ln-636-manual-test-auditor	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Checks manual test scripts for harness adoption, golden files, fail-fast, config sourcing, idempotency. Use when auditing manual test quality.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

> **Paths:** File paths (`shared/`, `references/`, `../ln-*`) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root. If `shared/` is missing, fetch files via WebFetch from `https://raw.githubusercontent.com/levnikolaevich/claude-code-skills/master/skills/{path}`.

# Manual Test Quality Auditor (L3 Worker)

**Type:** L3 Worker

Specialized worker auditing manual test scripts for quality and best-practice compliance.

## Purpose & Scope

- **Worker in ln-630 coordinator pipeline**
- Audit **Manual Test Quality** (Category 7: Medium Priority)
- Evaluate bash test scripts in `tests/manual/` against quality dimensions
- Calculate compliance score (X/10)

## Inputs (from Coordinator)

**MANDATORY READ:** Load `shared/references/audit_worker_core_contract.md`.

Receives `contextStore` with: `tech_stack`, `testFilesMetadata` (filtered to `type: "manual"`), `codebase_root`, `output_dir`.

Manual test metadata includes: `suite_dir`, `has_expected_dir`, `harness_sourced`.

## Workflow

**MANDATORY READ:** Load `shared/references/two_layer_detection.md` for detection methodology.

1) **Parse Context:** Extract manual test file list, output_dir, codebase_root from contextStore
2) **Discover Infrastructure:** Detect shared infrastructure files:
   - `tests/manual/config.sh` -- shared configuration
   - `tests/manual/test_harness.sh` -- shared test framework (if exists)
   - `tests/manual/test-all.sh` -- master runner
   - `tests/manual/TEMPLATE-*.sh` -- test templates (if exist)
   - `tests/manual/regenerate-golden.sh` -- golden file regeneration (if exists)
3) **Scan Scripts (Layer 1):** For each manual test script, check 7 quality dimensions (see Audit Rules)
3b) **Context Analysis (Layer 2 -- MANDATORY):** For each candidate finding, ask:
   - Is this a setup/utility script (e.g., `00-setup/*.sh`, `tools/*.sh`)? Setup scripts have different requirements -- skip harness/golden checks
   - Is this a master runner (`test-all.sh`)? Master runners orchestrate, not test -- skip all checks except fail-fast
   - Does the project not use a shared harness at all? If no `test_harness.sh` exists, harness adoption check is N/A
4) **Collect Findings:** Record violations with severity, location (file:line), effort, recommendation
5) **Calculate Score:** Count violations by severity, calculate compliance score (X/10)
6) **Write Report:** Build full markdown report in memory per `shared/templates/audit_worker_report_template.md`, write to `{output_dir}/636-manual-test-quality.md` in single Write call
7) **Return Summary:** Return minimal summary to coordinator (see Output Format)

## Audit Rules

### 1. Harness Adoption

**What:** Test script uses shared framework (`run_test`, `init_test_state`) instead of custom assertion logic

**Detection:**
- Grep for `run_test`, `init_test_state` in script
- If absent AND script contains custom test loops/assertions -> custom logic
- If `test_harness.sh` does not exist in project -> skip this check entirely

**Severity:** **HIGH** (custom logic = maintenance burden, inconsistent reporting)

**Recommendation:** Refactor to use shared `run_test` from test_harness.sh

**Effort:** M

### 2. Golden File Completeness

**What:** Test suite has `expected/` directory with reference files matching test scenarios

**Detection:**
- Check if suite directory has `expected/` subdirectory
- Compare: number of test scenarios (grep `run_test` calls) vs number of expected files
- If test uses `diff` against expected files but expected dir is missing -> finding

**Layer 2:** Not all tests need golden files. Tests validating HTTP status codes, timing, or dynamic data may legitimately skip golden comparison -> skip if test has no `diff` or comparison against files

**Severity:** **HIGH** (no golden files = no regression detection for output correctness)

**Recommendation:** Add expected/ directory with reference output files

**Effort:** M

### 3. Config Sourcing

**What:** Script sources shared `config.sh` for consistent configuration

**Detection:**
- Grep for `source.*config.sh` or `. .*config.sh`
- If absent -> script manages its own BASE_URL, tokens, etc.

**Layer 2:** If script is self-contained utility (e.g., `tools/*.sh`) -> skip

**Severity:** **MEDIUM**

**Recommendation:** Add `source "$THIS_DIR/../config.sh"` for shared configuration

**Effort:** S

### 4. Fail-Fast Compliance

**What:** Script uses `set -e` and returns exit code 1 on failure

**Detection:**
- Grep for `set -e` (or `set -eo pipefail`)
- Check that failure paths lead to non-zero exit (not swallowed by `|| true` everywhere)

**Severity:** **HIGH** (silent failures mask broken tests)

**Recommendation:** Add `set -e` at script start, ensure test failures propagate

**Effort:** S

### 5. Template Compliance

**What:** Script follows project test templates (TEMPLATE-api-endpoint.sh, TEMPLATE-document-format.sh)

**Detection:**
- If TEMPLATE files exist in `tests/manual/`, check structural alignment:
  - Header comment block with description, ACs tested, prerequisites
  - Standard variable naming (`THIS_DIR`, `EXPECTED_DIR`)
  - Standard setup pattern (`source config.sh`, `check_jq`, `setup_auth`)
- If NO templates exist in project -> skip this check entirely

**Layer 2:** Older scripts written before templates may diverge. Flag as MEDIUM, not HIGH

**Severity:** **MEDIUM**

**Recommendation:** Align script structure with project TEMPLATE files

**Effort:** M

### 6. Idempotency

**What:** Script can be rerun safely without side effects from previous runs

**Detection:**
- Grep for cleanup patterns: `trap.*EXIT`, `rm -f`, `cleanup` functions
- Check for temp file creation without cleanup
- Check for hardcoded resource names that would conflict on rerun (e.g., creating user with fixed email without checking existence)

**Layer 2:** Scripts that only READ data (GET requests, queries) are inherently idempotent -> skip

**Severity:** **MEDIUM**

**Recommendation:** Add cleanup trap or use unique identifiers per run

**Effort:** S-M

### 7. Documentation

**What:** Test suite directory has README.md explaining purpose and prerequisites

**Detection:**
- Check if suite directory (`NN-feature/`) contains README.md
- If missing -> finding

**Layer 2:** Setup directories (`00-setup/`) and utility directories (`tools/`) may not need README -> skip

**Severity:** **LOW**

**Recommendation:** Add README.md with test purpose, prerequisites, usage

**Effort:** S

## Scoring Algorithm

**MANDATORY READ:** Load `shared/references/audit_worker_core_contract.md` and `shared/references/audit_scoring.md`.

**Severity mapping:**
- Missing harness adoption (when harness exists), No golden files (when expected-based), No fail-fast -> HIGH
- Missing config sourcing, Template divergence, No idempotency -> MEDIUM
- Missing README -> LOW

## Output Format

**MANDATORY READ:** Load `shared/references/audit_worker_core_contract.md` and `shared/templates/audit_worker_report_template.md`.

If summaryArtifactPath is present, write JSON summary per shared/references/audit_summary_contract.md. Compact text output is fallback only.

Write report to `{output_dir}/636-manual-test-quality.md` with `category: "Manual Test Quality"` and checks: harness_adoption, golden_file_completeness, config_sourcing, fail_fast_compliance, template_compliance, idempotency, documentation.

Return summary per `shared/references/audit_summary_contract.md`.

Legacy compact text output is allowed only when `summaryArtifactPath` is absent:
```
Report written: .hex-skills/runtime-artifacts/runs/{run_id}/audit-report/636-manual-test-quality.md
Score: X.X/10 | Issues: N (C:N H:N M:N L:N)
```

## Critical Rules

**MANDATORY READ:** Load `shared/references/audit_worker_core_contract.md`.

- **Do not auto-fix:** Report only
- **Effort realism:** S = <1h, M = 1-4h, L = >4h
- **Skip when empty:** If no `tests/manual/` directory exists, return score 10/10 with zero findings
- **Exclude non-test files:** Skip `config.sh`, `test_harness.sh`, `test-all.sh`, `regenerate-golden.sh`, `TEMPLATE-*.sh`, files in `tools/`, `results/`, `test-runs/`
- **Context-aware:** Setup scripts (`00-setup/`) have relaxed requirements (no golden files, no harness needed)

## Definition of Done

**MANDATORY READ:** Load `shared/references/audit_worker_core_contract.md`.

- [ ] contextStore parsed successfully (including output_dir)
- [ ] Manual test infrastructure discovered (config.sh, harness, templates)
- [ ] All 7 checks completed per test script
- [ ] Layer 2 context analysis applied (setup/utility exclusions)
- [ ] Findings collected with severity, location, effort, recommendation
- [ ] Score calculated using penalty algorithm
- [ ] Report written to `{output_dir}/636-manual-test-quality.md` (atomic single Write call)
- [ ] Summary written per contract

## Reference Files

- **Audit output schema:** `shared/references/audit_output_schema.md`

---
**Version:** 1.0.0
**Last Updated:** 2026-03-13

Related Skills

ln-782-test-runner

310

from levnikolaevich/claude-code-skills

Executes all test suites and reports results with coverage. Use when verifying that test infrastructure works after bootstrap.

ln-743-test-infrastructure

310

from levnikolaevich/claude-code-skills

Sets up test infrastructure with Vitest, xUnit, and pytest. Use when adding testing frameworks and sample tests to a project.

ln-654-resource-lifecycle-auditor

310

from levnikolaevich/claude-code-skills

Checks session scope mismatch, missing cleanup, pool config, error path leaks, resource holding. Use when auditing resource lifecycle.

ln-653-runtime-performance-auditor

310

from levnikolaevich/claude-code-skills

Checks blocking IO in async, unnecessary allocations, sync sleep, string concat in loops, redundant copies. Use when auditing runtime performance.

ln-652-transaction-correctness-auditor

310

from levnikolaevich/claude-code-skills

Checks transaction scope, missing rollback handling, long-held transactions, trigger/notify interaction. Use when auditing transaction correctness.

ln-651-query-efficiency-auditor

310

from levnikolaevich/claude-code-skills

Checks redundant fetches, N+1 loops, over-fetching, missing bulk operations, wrong caching scope. Use when auditing query efficiency.

ln-650-persistence-performance-auditor

310

from levnikolaevich/claude-code-skills

Coordinates persistence and performance audit across queries, transactions, runtime, and resource lifecycle. Use when auditing data layer performance.

ln-647-env-config-auditor

310

from levnikolaevich/claude-code-skills

Checks env var config sync, missing defaults, naming conventions, startup validation. Use when auditing environment configuration.

ln-646-project-structure-auditor

310

from levnikolaevich/claude-code-skills

Checks file hygiene, ignore files, framework conventions, domain/layer organization, naming. Use when auditing project structure.

ln-644-dependency-graph-auditor

310

from levnikolaevich/claude-code-skills

Builds dependency graph, detects cycles, validates boundary rules, calculates coupling metrics (Ca/Ce/I). Use when auditing dependency structure.

ln-643-api-contract-auditor

310

from levnikolaevich/claude-code-skills

Checks layer leakage in method signatures, missing DTOs, entity leakage to API, inconsistent error contracts. Use when auditing API contracts.

ln-642-layer-boundary-auditor

310

from levnikolaevich/claude-code-skills

Checks layer boundary violations, transaction boundaries, session ownership, cross-layer consistency. Use when auditing architecture layers.