proof-of-work
Proof artifact generation patterns for task validation. Covers screenshots, test results, deployments, and confidence scoring.
Best use case
proof-of-work is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Proof artifact generation patterns for task validation. Covers screenshots, test results, deployments, and confidence scoring.
Teams using proof-of-work should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/proof-of-work/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How proof-of-work Compares
| Feature / Agent | proof-of-work | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Proof artifact generation patterns for task validation. Covers screenshots, test results, deployments, and confidence scoring.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
plugin: autopilot
updated: 2026-01-20
# Proof-of-Work
**Version:** 0.1.0
**Purpose:** Generate validation artifacts for autonomous task completion
**Status:** Phase 1
## When to Use
Use this skill when you need to:
- Generate proof artifacts after task completion
- Capture screenshots for UI verification
- Parse and report test results
- Calculate confidence scores for task validation
- Determine if a task can be auto-approved
## Overview
Proof-of-work is the mechanism that validates task completion. Every finished task must include verifiable artifacts that demonstrate the work was done correctly.
## Proof Types by Task
### Bug Fix Proof
| Artifact | Required | Purpose |
|----------|----------|---------|
| Git diff | Yes | Show minimal, focused changes |
| Test results | Yes | All tests passing |
| Regression test | Yes | Specific test for the bug |
| Error log (before/after) | Optional | Visual evidence |
### Feature Proof
| Artifact | Required | Purpose |
|----------|----------|---------|
| Screenshots | Yes | Visual verification |
| Test results | Yes | Functionality works |
| Coverage report | Yes | >= 80% coverage |
| Build output | Yes | Builds successfully |
| Deployment URL | Optional | Live demo |
### UI Change Proof
| Artifact | Required | Purpose |
|----------|----------|---------|
| Desktop screenshot | Yes | 1920x1080 view |
| Mobile screenshot | Yes | 375x667 view |
| Tablet screenshot | Yes | 768x1024 view |
| Accessibility score | Yes | >= 80 Lighthouse |
| Visual regression | Optional | BackstopJS diff |
## Screenshot Capture
**Playwright Pattern:**
```typescript
import { chromium } from 'playwright';
async function captureScreenshots(url: string, outputDir: string) {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext();
const page = await context.newPage();
// Desktop
await page.setViewportSize({ width: 1920, height: 1080 });
await page.goto(url);
await page.waitForLoadState('networkidle');
await page.screenshot({
path: `${outputDir}/desktop.png`,
fullPage: true,
});
// Mobile
await page.setViewportSize({ width: 375, height: 667 });
await page.goto(url);
await page.waitForLoadState('networkidle');
await page.screenshot({
path: `${outputDir}/mobile.png`,
fullPage: true,
});
// Tablet
await page.setViewportSize({ width: 768, height: 1024 });
await page.goto(url);
await page.waitForLoadState('networkidle');
await page.screenshot({
path: `${outputDir}/tablet.png`,
fullPage: true,
});
await browser.close();
}
```
## Confidence Scoring
**Algorithm:**
```typescript
interface ProofArtifacts {
testResults?: { passed: number; total: number };
buildSuccessful?: boolean;
lintErrors?: number;
screenshots?: string[];
testCoverage?: number;
performanceScore?: number;
}
function calculateConfidence(artifacts: ProofArtifacts): number {
let score = 0;
// Tests (40 points)
if (artifacts.testResults) {
if (artifacts.testResults.passed === artifacts.testResults.total) {
score += 40;
}
}
// Build (20 points)
if (artifacts.buildSuccessful) {
score += 20;
}
// Coverage (20 points)
if (artifacts.testCoverage) {
if (artifacts.testCoverage >= 80) score += 20;
else if (artifacts.testCoverage >= 60) score += 15;
else if (artifacts.testCoverage >= 40) score += 10;
else score += 5;
}
// Screenshots (10 points)
if (artifacts.screenshots) {
if (artifacts.screenshots.length >= 3) score += 10;
else if (artifacts.screenshots.length >= 1) score += 5;
}
// Lint (10 points)
if (artifacts.lintErrors === 0) {
score += 10;
}
return score;
}
```
## Confidence Thresholds
| Confidence | Action |
|------------|--------|
| >= 95% | Auto-approve (In Review -> Done) |
| 80-94% | Manual review required |
| < 80% | Validation failed, iterate |
## Proof Summary Template
```markdown
# Proof of Work
**Task**: {issue_id}
**Type**: {task_type}
**Confidence**: {score}%
## Test Results
- Total: {total}
- Passed: {passed}
- Failed: {failed}
- Coverage: {coverage}%
## Build
- Status: {status}
- Duration: {duration}
## Screenshots
- Desktop: proof/desktop.png
- Mobile: proof/mobile.png
- Tablet: proof/tablet.png
## Artifacts
- test-results.txt
- coverage.json
- build-output.txt
```
## Examples
### Example 1: Feature Proof Generation
```typescript
const proof = {
testResults: { passed: 15, total: 15 },
buildSuccessful: true,
lintErrors: 0,
screenshots: ['desktop.png', 'mobile.png', 'tablet.png'],
testCoverage: 85,
};
const confidence = calculateConfidence(proof);
// 40 (tests) + 20 (build) + 20 (coverage) + 10 (screenshots) + 10 (lint) = 100%
```
### Example 2: Partial Proof
```typescript
const proof = {
testResults: { passed: 12, total: 15 }, // Some failing
buildSuccessful: true,
lintErrors: 2,
screenshots: ['desktop.png'],
testCoverage: 65,
};
const confidence = calculateConfidence(proof);
// 0 (tests fail) + 20 (build) + 15 (coverage) + 5 (1 screenshot) + 0 (lint errors) = 40%
// Result: Validation failed, must iterate
```
## Best Practices
- Always capture screenshots for UI work
- Run full test suite, not just affected tests
- Include coverage report for features
- Build must pass before any proof is valid
- Store proofs in session directory for debugging
- Generate proof summary in markdown for Linear commentsRelated Skills
worktree-lifecycle
Use when starting isolated feature work or before executing implementation plans. Manages full worktree lifecycle from creation through cleanup with safety checks and error recovery.
test-skill
A test skill for validation testing. Use when testing skill parsing and validation logic.
bad-skill
This skill has invalid YAML in frontmatter
release
Plugin release process for MAG Claude Plugins marketplace. Covers version bumping, marketplace.json updates, git tagging, and common mistakes. Use when releasing new plugin versions or troubleshooting update issues.
openrouter-trending-models
Fetch trending programming models from OpenRouter rankings. Use when selecting models for multi-model review, updating model recommendations, or researching current AI coding trends. Provides model IDs, context windows, pricing, and usage statistics from the most recent week.
Claudish Integration Skill
**Version:** 1.0.0
transcription
Audio/video transcription using OpenAI Whisper. Covers installation, model selection, transcript formats (SRT, VTT, JSON), timing synchronization, and speaker diarization. Use when transcribing media or generating subtitles.
final-cut-pro
Apple Final Cut Pro FCPXML format reference. Covers project structure, timeline creation, clip references, effects, and transitions. Use when generating FCP projects or understanding FCPXML structure.
ffmpeg-core
FFmpeg fundamentals for video/audio manipulation. Covers common operations (trim, concat, convert, extract), codec selection, filter chains, and performance optimization. Use when planning or executing video processing tasks.
statusline-customization
Configuration reference and troubleshooting for the statusline plugin — sections, themes, bar widths, and script architecture
technical-audit
Technical SEO audit methodology including crawlability, indexability, and Core Web Vitals analysis. Use when auditing pages or sites for technical SEO issues.
serp-analysis
SERP analysis techniques for intent classification, feature identification, and competitive intelligence. Use when analyzing search results for content strategy.