evidence-verification

This skill teaches agents how to collect and verify evidence before marking tasks complete. Inspired by production-grade development practices, it ensures all claims are backed by executable proof:...

242 stars

byaiskillstore

View on GitHub Installation ↓

Best use case

evidence-verification is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. This skill teaches agents how to collect and verify evidence before marking tasks complete. Inspired by production-grade development practices, it ensures all claims are backed by executable proof:...

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "evidence-verification" skill to help with this workflow task. Context: This skill teaches agents how to collect and verify evidence before marking tasks complete. Inspired by production-grade development practices, it ensures all claims are backed by executable proof:...

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

Do not use this when you only need a one-off answer and do not need a reusable workflow.
Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/evidence-verification/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/ariegoldkin/evidence-verification/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/evidence-verification/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How evidence-verification Compares

Feature / Agent	evidence-verification	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Evidence-Based Verification Skill

**Version:** 1.0.0
**Type:** Quality Assurance
**Auto-activate:** Code review, task completion, production deployment

## Overview

This skill teaches agents how to collect and verify evidence before marking tasks complete. Inspired by production-grade development practices, it ensures all claims are backed by executable proof: test results, coverage metrics, build success, and deployment verification.

**Key Principle:** Show, don't tell. No task is complete without verifiable evidence.

---

## When to Use This Skill

### Auto-Activate Triggers
- Completing code implementation
- Finishing code review
- Marking tasks complete in Squad mode
- Before agent handoff
- Production deployment verification

### Manual Activation
- When user requests "verify this works"
- Before creating pull requests
- During quality assurance reviews
- When troubleshooting failures

---

## Core Concepts

### 1. Evidence Types

**Test Evidence**
- Exit code (must be 0 for success)
- Test suite results (passed/failed/skipped)
- Coverage percentage (if available)
- Test duration

**Build Evidence**
- Build exit code (0 = success)
- Compilation errors/warnings
- Build artifacts created
- Build duration

**Deployment Evidence**
- Deployment status (success/failed)
- Environment deployed to
- Health check results
- Rollback capability verified

**Code Quality Evidence**
- Linter results (errors/warnings)
- Type checker results
- Security scan results
- Accessibility audit results

### 2. Evidence Collection Protocol

```markdown
## Evidence Collection Steps

1. **Identify Verification Points**
   - What needs to be proven?
   - What could go wrong?
   - What does "complete" mean?

2. **Execute Verification**
   - Run tests
   - Run build
   - Run linters
   - Check deployments

3. **Capture Results**
   - Record exit codes
   - Save output snippets
   - Note timestamps
   - Document environment

4. **Store Evidence**
   - Add to shared context
   - Reference in task completion
   - Link to artifacts
```

### 3. Verification Standards

**Minimum Evidence Requirements:**
- ✅ At least ONE verification type executed
- ✅ Exit code captured (0 = pass, non-zero = fail)
- ✅ Timestamp recorded
- ✅ Evidence stored in context

**Production-Grade Requirements:**
- ✅ Tests run with exit code 0
- ✅ Coverage >70% (or project standard)
- ✅ Build succeeds with exit code 0
- ✅ No critical linter errors
- ✅ Security scan passes

---

## Evidence Collection Templates

### Template 1: Test Evidence

Use this template when running tests:

```markdown
## Test Evidence

**Command:** `npm test` (or equivalent)
**Exit Code:** 0 ✅ / non-zero ❌
**Duration:** X seconds
**Results:**
- Tests passed: X
- Tests failed: X
- Tests skipped: X
- Coverage: X%

**Output Snippet:**
```
[First 10 lines of test output]
```

**Timestamp:** YYYY-MM-DD HH:MM:SS
**Environment:** Node vX.X.X, OS, etc.
```

### Template 2: Build Evidence

Use this template when building:

```markdown
## Build Evidence

**Command:** `npm run build` (or equivalent)
**Exit Code:** 0 ✅ / non-zero ❌
**Duration:** X seconds
**Artifacts Created:**
- dist/bundle.js (XXX KB)
- dist/styles.css (XXX KB)

**Errors:** X
**Warnings:** X

**Output Snippet:**
```
[First 10 lines of build output]
```

**Timestamp:** YYYY-MM-DD HH:MM:SS
```

### Template 3: Code Quality Evidence

Use this template for linting and type checking:

```markdown
## Code Quality Evidence

**Linter:** ESLint / Ruff / etc.
**Command:** `npm run lint`
**Exit Code:** 0 ✅ / non-zero ❌
**Errors:** X
**Warnings:** X

**Type Checker:** TypeScript / mypy / etc.
**Command:** `npm run typecheck`
**Exit Code:** 0 ✅ / non-zero ❌
**Type Errors:** X

**Timestamp:** YYYY-MM-DD HH:MM:SS
```

### Template 4: Combined Evidence Report

Use this comprehensive template for task completion:

```markdown
## Task Completion Evidence

### Task: [Task description]
### Agent: [Agent name]
### Completed: YYYY-MM-DD HH:MM:SS

### Verification Results

| Check | Command | Exit Code | Result |
|-------|---------|-----------|--------|
| Tests | `npm test` | 0 | ✅ 45 passed, 0 failed |
| Build | `npm run build` | 0 | ✅ Bundle created (234 KB) |
| Linter | `npm run lint` | 0 | ✅ No errors, 2 warnings |
| Types | `npm run typecheck` | 0 | ✅ No type errors |

### Coverage
- Statements: 87%
- Branches: 82%
- Functions: 90%
- Lines: 86%

### Evidence Files
- Test output: `.claude/quality-gates/evidence/tests-2025-XX-XX.log`
- Build output: `.claude/quality-gates/evidence/build-2025-XX-XX.log`

### Conclusion
All verification checks passed. Task ready for review.
```

---

## Step-by-Step Workflows

### Workflow 1: Code Implementation Verification

**When:** After writing code for a feature or bug fix

**Steps:**

1. **Save all files** - Ensure changes are written

2. **Run tests**
   ```bash
   npm test
   # or: pytest, cargo test, go test, etc.
   ```
   - Capture exit code
   - Note passed/failed counts
   - Record coverage if available

3. **Run build** (if applicable)
   ```bash
   npm run build
   # or: cargo build, go build, etc.
   ```
   - Capture exit code
   - Note any errors/warnings
   - Verify artifacts created

4. **Run linter**
   ```bash
   npm run lint
   # or: ruff check, cargo clippy, golangci-lint run
   ```
   - Capture exit code
   - Note errors/warnings

5. **Run type checker** (if applicable)
   ```bash
   npm run typecheck
   # or: mypy, tsc --noEmit
   ```
   - Capture exit code
   - Note type errors

6. **Document evidence**
   - Use Template 4 (Combined Evidence Report)
   - Add to shared context under `quality_evidence`
   - Reference in task completion message

7. **Mark task complete** (only if all evidence passes)

### Workflow 2: Code Review Verification

**When:** Reviewing another agent's code or user's PR

**Steps:**

1. **Read the code changes**

2. **Verify tests exist**
   - Are there tests for new functionality?
   - Do tests cover edge cases?
   - Are existing tests updated?

3. **Run tests**
   - Execute test suite
   - Verify exit code 0
   - Check coverage didn't decrease

4. **Check build**
   - Ensure project still builds
   - No new build errors

5. **Verify code quality**
   - Run linter
   - Run type checker
   - Check for security issues

6. **Document review evidence**
   - Use Template 3 (Code Quality Evidence)
   - Note any issues found
   - Add to context

7. **Approve or request changes**
   - Approve only if all evidence passes
   - If issues found, document them with evidence

### Workflow 3: Production Deployment Verification

**When:** Deploying to production or staging

**Steps:**

1. **Pre-deployment checks**
   - All tests pass (exit code 0)
   - Build succeeds
   - No critical linter errors
   - Security scan passes

2. **Execute deployment**
   - Run deployment command
   - Capture output

3. **Post-deployment checks**
   - Health check endpoint responds
   - Application starts successfully
   - No immediate errors in logs
   - Smoke tests pass

4. **Document deployment evidence**
   ```markdown
   ## Deployment Evidence

   **Environment:** production
   **Timestamp:** YYYY-MM-DD HH:MM:SS
   **Version:** vX.X.X

   **Pre-Deployment:**
   - Tests: ✅ Exit 0
   - Build: ✅ Exit 0
   - Security: ✅ No critical issues

   **Deployment:**
   - Command: `kubectl apply -f deployment.yaml`
   - Exit Code: 0 ✅

   **Post-Deployment:**
   - Health Check: ✅ 200 OK
   - Smoke Tests: ✅ All passed
   - Error Rate: <0.1%
   ```

5. **Verify rollback capability**
   - Ensure previous version can be restored
   - Document rollback procedure

---

## Evidence Storage

### Where to Store Evidence

**Shared Context** (Primary)
```json
{
  "quality_evidence": {
    "tests_run": true,
    "test_exit_code": 0,
    "coverage_percent": 87,
    "build_success": true,
    "build_exit_code": 0,
    "linter_errors": 0,
    "linter_warnings": 2,
    "timestamp": "2025-11-02T10:30:00Z"
  }
}
```

**Evidence Files** (Secondary)
- `.claude/quality-gates/evidence/` directory
- One file per verification run
- Format: `{type}-{timestamp}.log`
- Example: `tests-2025-11-02-103000.log`

**Task Completion Messages**
- Include evidence summary
- Link to detailed evidence files
- Example: "Task complete. Tests passed (exit 0, 87% coverage), build succeeded."

---

## Quality Standards

### Minimum Acceptable

✅ **Tests executed** with captured exit code
✅ **Timestamp** recorded
✅ **Evidence stored** in context

### Production-Grade

✅ **Tests pass** (exit code 0)
✅ **Coverage ≥70%** (or project standard)
✅ **Build succeeds** (exit code 0)
✅ **No critical linter errors**
✅ **Type checker passes**
✅ **Security scan** shows no critical issues

### Gold Standard

✅ All production-grade requirements
✅ **Coverage ≥80%**
✅ **No linter warnings**
✅ **Performance benchmarks** within thresholds
✅ **Accessibility audit** passes (WCAG 2.1 AA)
✅ **Integration tests** pass
✅ **Deployment verification** complete

---

## Common Pitfalls

### ❌ Don't Skip Evidence Collection

**Bad:**
```
"I've implemented the login feature. It should work correctly."
```

**Good:**
```
"I've implemented the login feature. Evidence:
- Tests: Exit code 0, 12 tests passed, 0 failed
- Build: Exit code 0, no errors
- Coverage: 89%
Task complete with verification."
```

### ❌ Don't Fake Evidence

**Bad:**
```
"Tests passed" (without actually running them)
```

**Good:**
```
"Tests passed. Exit code: 0
Command: npm test
Output: Test Suites: 3 passed, 3 total
Timestamp: 2025-11-02 10:30:15"
```

### ❌ Don't Ignore Failed Evidence

**Bad:**
```
"Build failed with exit code 1, but the code looks correct so marking complete."
```

**Good:**
```
"Build failed with exit code 1. Errors:
- TypeError: Cannot read property 'id' of undefined (line 42)
Fixing the error now before marking complete."
```

### ❌ Don't Collect Evidence Only Once

**Bad:**
```
"Tests passed yesterday, so the code is still good."
```

**Good:**
```
"Re-running tests after today's changes.
New evidence: Exit code 0, 45 tests passed, coverage 87%"
```

---

## Integration with Other Systems

### Context System Integration

Evidence is automatically tracked in shared context:

```typescript
// Context structure includes:
{
  quality_evidence?: {
    tests_run: boolean;
    test_exit_code?: number;
    coverage_percent?: number;
    build_success?: boolean;
    linter_errors?: number;
    timestamp: string;
  }
}
```

### Quality Gates Integration

Evidence collection feeds into quality gates:
- Quality gates check if evidence exists
- Block task completion if evidence missing
- Escalate if evidence shows failures

### Squad Mode Integration

In parallel execution:
- Each agent collects evidence independently
- Studio Coach validates evidence before sync
- Blocked tasks don't waste parallel cycles

---

## Quick Reference

### Evidence Collection Checklist

```markdown
Before marking task complete:

- [ ] Tests executed
- [ ] Test exit code captured (0 = pass)
- [ ] Build executed (if applicable)
- [ ] Build exit code captured (0 = pass)
- [ ] Code quality checks run (linter, types)
- [ ] Evidence documented with timestamp
- [ ] Evidence added to shared context
- [ ] Evidence summary in completion message
```

### Common Commands by Language/Framework

**JavaScript/TypeScript:**
```bash
npm test                 # Run tests
npm run build           # Build project
npm run lint            # Run ESLint
npm run typecheck       # Run TypeScript compiler
```

**Python:**
```bash
pytest                  # Run tests
pytest --cov           # Run tests with coverage
ruff check .           # Run linter
mypy .                 # Run type checker
```

**Rust:**
```bash
cargo test             # Run tests
cargo build            # Build project
cargo clippy           # Run linter
```

**Go:**
```bash
go test ./...          # Run tests
go build               # Build project
golangci-lint run      # Run linter
```

---

## Examples

See `/skills/evidence-verification/examples/` for:
- Sample evidence reports
- Real-world verification scenarios
- Integration examples

---

## Version History

**v1.0.0** - Initial release
- Core evidence collection templates
- Verification workflows
- Quality standards
- Integration with context system

---

**Remember:** Evidence-first development prevents hallucinations, ensures production quality, and builds confidence. When in doubt, collect more evidence, not less.

Related Skills

viral-verification

242

from aiskillstore/marketplace

验证文章是否符合爆款推文要素，并提供修改建议和评分。

verification-hygiene

242

from aiskillstore/marketplace

External evidence discipline and search execution routing. Bridges structure_judgment and judgment_hygiene to govern how the model searches, what it retrieves, when to stop, and how to format evidence before internal reasoning. Prevents treating SEO-driven internet as infallible.

when-verifying-quality-use-verification-quality

242

from aiskillstore/marketplace

Comprehensive quality verification and validation through static analysis, dynamic testing, integration validation, and certification gates

verification-quality-assurance

242

from aiskillstore/marketplace

Comprehensive truth scoring, code quality verification, and automatic rollback system with 0.95 accuracy threshold for ensuring high-quality agent outputs and codebase reliability.

verification-protocol

242

from aiskillstore/marketplace

Independent verification of task completion - eliminates self-attestation

verification-before-completion

242

from aiskillstore/marketplace

Force verification before claiming success or completion. Prevents false "it works" claims. Triggers when about to say "done", "complete", "works", "fixed", or "the implementation is ready". Requires actually running builds/tests and showing output before claiming success.

azure-quotas

242

from aiskillstore/marketplace

Check/manage Azure quotas and usage across providers. For deployment planning, capacity validation, region selection. WHEN: "check quotas", "service limits", "current usage", "request quota increase", "quota exceeded", "validate capacity", "regional availability", "provisioning limits", "vCPU limit", "how many vCPUs available in my subscription".

DevOps & Infrastructure

raindrop-io

242

from aiskillstore/marketplace

Manage Raindrop.io bookmarks with AI assistance. Save and organize bookmarks, search your collection, manage reading lists, and organize research materials. Use when working with bookmarks, web research, reading lists, or when user mentions Raindrop.io.

Data & Research

zlibrary-to-notebooklm

242

from aiskillstore/marketplace

自动从 Z-Library 下载书籍并上传到 Google NotebookLM。支持 PDF/EPUB 格式，自动转换，一键创建知识库。

discover-skills

242

from aiskillstore/marketplace

当你发现当前可用的技能都不够合适（或用户明确要求你寻找技能）时使用。本技能会基于任务目标和约束，给出一份精简的候选技能清单，帮助你选出最适配当前任务的技能。

web-performance-seo

242

from aiskillstore/marketplace

Fix PageSpeed Insights/Lighthouse accessibility "!" errors caused by contrast audit failures (CSS filters, OKLCH/OKLAB, low opacity, gradient text, image backgrounds). Use for accessibility-driven SEO/performance debugging and remediation.

project-to-obsidian

242

from aiskillstore/marketplace

将代码项目转换为 Obsidian 知识库。当用户提到 obsidian、项目文档、知识库、分析项目、转换项目时激活。【激活后必须执行】： 1. 先完整阅读本 SKILL.md 文件 2. 理解 AI 写入规则（默认到 00_Inbox/AI/、追加式、统一 Schema） 3. 执行 STEP 0: 使用 AskUserQuestion 询问用户确认 4. 用户确认后才开始 STEP 1 项目扫描 5. 严格按 STEP 0 → 1 → 2 → 3 → 4 顺序执行【禁止行为】： - 禁止不读 SKILL.md 就开始分析项目 - 禁止跳过 STEP 0 用户确认 - 禁止直接在 30_Resources 创建（先到 00_Inbox/AI/） - 禁止自作主张决定输出位置