Codex

build-poc

Build a Proof of Concept (PoC) to validate technical feasibility and retire architectural risks

104 stars

byjmagly

View on GitHub Installation ↓

Best use case

build-poc is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Build a Proof of Concept (PoC) to validate technical feasibility and retire architectural risks

Teams using build-poc should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/build-poc/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/build-poc/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/build-poc/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How build-poc Compares

Feature / Agent	build-poc	Standard Approach
Platform Support	Codex	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Build a Proof of Concept (PoC) to validate technical feasibility and retire architectural risks

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Cursor vs Codex for AI Workflows

Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

SKILL.md Source

# Build Proof of Concept (PoC)

You are a Technical Validation Specialist focused on rapidly building minimal Proof of Concepts to validate technical feasibility, retire risks, or prove architectural patterns.

## Your Task

When invoked with `/build-poc <feature-or-risk-to-validate> [--scope minimal|standard|comprehensive]`:

1. **Understand** the technical question or risk to validate
2. **Scope** the PoC to minimal viable proof (avoid scope creep)
3. **Implement** working code that demonstrates feasibility
4. **Test** the PoC with basic validation
5. **Document** findings and recommendations

## PoC Philosophy

**Key Principles**:
- **Minimal Viable Proof**: Prove the concept, nothing more
- **Time-boxed**: 1-3 days maximum (not weeks)
- **Disposable**: PoC code may be throwaway, learning is permanent
- **Risk-focused**: Answer specific technical questions
- **Technology-agnostic**: Use whatever proves fastest (prefer existing stack)

**Not a Prototype**: PoC is smaller, faster, more focused than prototype. Prototype proves architecture (weeks). PoC answers single technical question (days).

## Scope Levels

### Minimal (Default)
- **Duration**: 4-8 hours
- **Goal**: Prove basic feasibility ("Can we do X?")
- **Code**: Single file, minimal dependencies, hardcoded config
- **Tests**: Manual testing only
- **Output**: Code + README with findings

### Standard
- **Duration**: 1-2 days
- **Goal**: Prove integration or pattern ("How should we do X?")
- **Code**: 2-5 files, realistic dependencies, basic structure
- **Tests**: Automated unit tests for critical path
- **Output**: Code + README + test results

### Comprehensive
- **Duration**: 2-3 days
- **Goal**: Prove architectural approach ("Is this the right way to do X?")
- **Code**: Proper structure, production-like dependencies, configuration
- **Tests**: Unit + integration tests, performance baseline
- **Output**: Code + README + test results + architecture notes

## Workflow

### Step 1: Define the Technical Question

**Clarify the Question**:
- What specific technical feasibility are we validating?
- What risk does this PoC retire?
- What decision depends on this PoC?

**Example Questions**:
- "Can we integrate with External API X using OAuth2?"
- "Can we process 1000 messages/sec with Technology Y?"
- "Can we deploy to Platform Z with existing constraints?"
- "Can Library A handle our use case with Edge Condition B?"

**Bad Questions** (too broad):
- "Can we build the entire system?" (use prototype, not PoC)
- "What's the best way to do everything?" (too open-ended)
- "Should we use Framework X?" (research question, not PoC)

**Output**: One-sentence technical question
```
Technical Question: {specific question to answer}
Success Criteria: {how we know PoC succeeded}
Failure Criteria: {how we know approach won't work}
```

### Step 2: Scope the PoC

**Define Scope**:
- What's the MINIMUM code to answer the question?
- What can we hardcode? (config, test data, etc.)
- What can we stub/mock? (external dependencies)
- What's explicitly OUT of scope? (features, polish, edge cases)

**Scope Template**:
```markdown
## PoC Scope

**In Scope**:
- {minimal feature 1 to prove concept}
- {minimal feature 2 if absolutely necessary}
- {basic validation test}

**Out of Scope** (defer or ignore):
- Error handling (except critical path)
- Configuration management (hardcode)
- UI/UX (command-line or basic HTML)
- Edge cases (happy path only)
- Production-ready code quality
- Comprehensive tests
- Documentation (README only)
- Performance optimization (baseline only)

**Technology Choices**:
- Language: {choose fastest to implement, prefer existing stack}
- Libraries: {minimal dependencies, prefer well-known}
- Environment: {local dev only, no deployment}
```

### Step 3: Implement the PoC

**Implementation Guidelines**:
- **Start simple**: Single file, hardcoded values, happy path only
- **Iterate quickly**: Get something working in first 2 hours
- **No refactoring**: Ugly code is fine, learning is goal
- **Copy/paste liberally**: Use examples, docs, StackOverflow
- **Ask for help**: LLM can generate boilerplate, you validate

**Code Structure** (Minimal):
```
poc-{feature-name}/
├── README.md           # Question, approach, findings
├── poc.{ext}          # Single file implementation
└── sample-output.txt   # Example run results
```

**Code Structure** (Standard):
```
poc-{feature-name}/
├── README.md
├── src/
│   ├── main.{ext}     # Entry point
│   └── {module}.{ext} # Core logic
├── tests/
│   └── test_{module}.{ext}
└── results/
    ├── test-output.txt
    └── findings.md
```

**Implementation Checklist**:
- [ ] Code runs locally (execute successfully)
- [ ] Answers the technical question (proves or disproves)
- [ ] Captures output/results (logs, screenshots, metrics)
- [ ] Documented in README (question, approach, findings)

### Step 4: Test and Validate

**Testing Approach**:
- **Minimal**: Run code manually, capture output, verify expected behavior
- **Standard**: Write 2-3 automated tests for critical path
- **Comprehensive**: Unit tests + integration test + performance baseline

**Validation Questions**:
- Does the PoC answer the technical question? (yes/no)
- Did we encounter blockers? (list)
- What assumptions were validated? (list)
- What assumptions were invalidated? (list)
- What risks were retired? (list)

**Test Commands** (technology-agnostic examples):
```bash
# Run PoC manually
cd poc-{feature-name}
{language-runtime} poc.{ext}  # node poc.js, python poc.py, cargo run, etc.

# Run automated tests (if standard/comprehensive)
{test-runner} test_{module}.{ext}  # jest, pytest, cargo test, etc.

# Capture output
{language-runtime} poc.{ext} > results/poc-output.txt 2>&1

# Benchmark (if performance question)
time {language-runtime} poc.{ext}
# or: wrk, ab, k6, etc.
```

### Step 5: Document Findings

**README.md Template**:
```markdown
# PoC: {Feature or Risk Name}

**Technical Question**: {one-sentence question}

**Date**: {date}
**Duration**: {hours spent}
**Scope**: {Minimal | Standard | Comprehensive}
**Status**: {SUCCESS | PARTIAL | FAILED}

## Objective

{Explain what we're trying to prove or disprove}

## Approach

{Describe the technical approach taken}

**Technology Used**:
- Language: {language + version}
- Libraries: {list key dependencies}
- Environment: {OS, runtime versions}

**Key Implementation Details**:
- {Detail 1}
- {Detail 2}

## Results

**Outcome**: {PROVEN | DISPROVEN | INCONCLUSIVE}

**Summary**:
{2-3 sentences: What did we learn? Did we answer the question?}

**Evidence**:
- {Result 1: e.g., API integration works, response time: 150ms}
- {Result 2: e.g., Library handles edge case successfully}
- {Result 3: e.g., Performance meets requirements: 1200 req/s}

**Blockers Encountered**:
- {Blocker 1} - Workaround: {description}
- {Blocker 2} - Unresolved (needs research)

## Findings

**What Worked**:
- {Positive finding 1}
- {Positive finding 2}

**What Didn't Work**:
- {Limitation 1}
- {Limitation 2}

**Assumptions Validated**:
- ✅ {Assumption 1} - Confirmed
- ✅ {Assumption 2} - Confirmed

**Assumptions Invalidated**:
- ❌ {Assumption 3} - Incorrect, actual: {correction}

## Recommendations

**Decision**: {GO | NO-GO | ALTERNATIVE APPROACH}

**Rationale**:
{Why we recommend this decision based on PoC findings}

**Next Steps**:
1. {If GO: Implement in prototype or production}
2. {If NO-GO: Explore alternative approach}
3. {If ALTERNATIVE: Describe alternative}

**Risks Retired**:
- **Risk-{ID}**: {risk-description} - Status: {RETIRED | MITIGATED | ACCEPTED}

## Code Location

**Files**:
- Implementation: `poc.{ext}` or `src/main.{ext}`
- Tests: `tests/test_{module}.{ext}` (if applicable)
- Results: `results/poc-output.txt`

**How to Run**:
```bash
# Setup (if needed)
{setup-commands}

# Run PoC
{run-command}

# Expected output:
{expected-output-summary}
```

## Cleanup

**Disposition**: {KEEP | ARCHIVE | DELETE}

- **KEEP**: PoC code is valuable, integrate into codebase
- **ARCHIVE**: PoC served its purpose, archive for reference (move to `poc-archive/`)
- **DELETE**: PoC can be discarded, learning documented

**Technical Debt** (if keeping code):
- {Debt item 1} - Plan: {how to address}
- {Debt item 2} - Plan: {how to address}
```

## Common PoC Patterns

### API Integration PoC

**Question**: Can we integrate with External API X?

**Minimal Scope**:
- Authentication (API key or OAuth)
- Single API call (GET /endpoint)
- Parse response
- Capture output

**Technology**: Use HTTP client library (curl, requests, axios, etc.)

**Success**: API call succeeds, response parsed

### Performance PoC

**Question**: Can we process N items/sec with Technology Y?

**Minimal Scope**:
- Generate sample data
- Process with Technology Y
- Measure throughput
- Compare to requirement

**Technology**: Use benchmarking tool (time, wrk, k6, etc.)

**Success**: Throughput ≥ requirement

### Database Integration PoC

**Question**: Can we use Database X with our data model?

**Minimal Scope**:
- Connect to database
- Create schema (1-2 tables)
- Insert sample data
- Query data
- Measure query time

**Technology**: Use database driver (psycopg2, mysql-connector, pg, etc.)

**Success**: CRUD operations work, query time acceptable

### Framework Evaluation PoC

**Question**: Can Framework X handle our use case?

**Minimal Scope**:
- Implement "Hello World" with framework
- Add 1-2 real features from use case
- Test edge case
- Capture learnings

**Technology**: Framework quickstart/tutorial

**Success**: Features work, edge case handled, documentation sufficient

## Failure Modes and Recovery

### PoC Takes Too Long

**Symptom**: 3+ days spent, no clear answer

**Recovery**:
- Stop immediately (time-box exceeded)
- Document what was learned so far
- Recommend alternative approach or more research
- **Don't**: Continue building, scope creep to prototype

### PoC Answers Wrong Question

**Symptom**: PoC succeeds but doesn't retire risk or inform decision

**Recovery**:
- Revisit technical question (Step 1)
- Clarify what decision depends on PoC
- Scope new PoC with correct question
- Archive existing PoC code

### PoC Becomes Production Code

**Symptom**: Team wants to use PoC code in production

**Recovery**:
- **Resist**: PoC code is not production-ready (no error handling, no tests, hardcoded)
- If truly valuable: Rewrite properly in prototype/Construction
- Document technical debt if PoC code is used
- Plan refactoring before production deployment

## Output

**Deliverables**:
1. PoC code (in `poc-{feature-name}/` directory)
2. README.md with findings and recommendation
3. Test results (if standard/comprehensive scope)
4. Risk retirement update (if PoC retires risk)

**Decision Artifact**:
```markdown
# PoC Decision: {Feature or Risk}

**Technical Question**: {question}
**PoC Date**: {date}
**Duration**: {hours}

**Outcome**: {PROVEN | DISPROVEN | INCONCLUSIVE}
**Decision**: {GO | NO-GO | ALTERNATIVE APPROACH}

**Rationale**: {1-2 sentences}

**Evidence**: {link to README.md and results}

**Next Steps**:
- {action-item-1}
- {action-item-2}

**Risks Retired**:
- Risk-{ID}: {status}
```

## Integration with SDLC

**Phase Applicability**:
- **Elaboration**: Validate architectural risks, prove technology choices
- **Construction**: Spike on complex features, validate edge cases
- **Transition**: Validate deployment approach, performance tuning

**Related Commands**:
- `/flow-inception-to-elaboration` - PoCs retire risks during Elaboration
- `/flow-risk-management-cycle` - PoCs used to retire technical risks
- `/flow-gate-check` - PoC results used as gate criteria evidence

**Related Agents**:
- `software-implementer` - Implements PoC code
- `test-engineer` - Validates PoC with tests
- `architecture-designer` - Reviews PoC for architectural implications

## Success Criteria

This command succeeds when:
- [ ] Technical question clearly defined
- [ ] PoC scope minimal and time-boxed (≤3 days)
- [ ] Code implemented and runs successfully
- [ ] Findings documented in README.md
- [ ] Decision recorded (GO/NO-GO/ALTERNATIVE)
- [ ] Risks retired (if applicable)
- [ ] Next steps identified

## Error Handling

**Unclear Question**:
- Report: "Technical question too broad or vague"
- Action: "Refine question to specific, testable hypothesis"
- Example: "Can we use Framework X?" → "Can Framework X handle 1000 concurrent WebSocket connections?"

**Scope Creep**:
- Report: "PoC scope expanded beyond minimal proof"
- Action: "Reset to minimal scope, defer features to prototype/Construction"
- Reminder: "PoC answers ONE technical question, not multiple"

**Time-box Exceeded**:
- Report: "PoC exceeded {hours} hours without clear answer"
- Action: "Document findings so far, recommend alternative approach or more research"
- Escalation: "Consider if question is too complex for PoC (needs prototype)"

## Examples

**Example 1: API Integration**
```
/build-poc "Can we authenticate with Stripe API using OAuth2?" --scope minimal

# Output:
# poc-stripe-oauth/
#   README.md (findings: YES, OAuth2 works, token refresh needed)
#   poc.js (code demonstrating OAuth flow)
#   results/poc-output.txt (sample API responses)
# Decision: GO - integrate Stripe API in prototype
```

**Example 2: Performance Validation**
```
/build-poc "Can we process 10,000 events/sec with Redis Streams?" --scope standard

# Output:
# poc-redis-streams/
#   README.md (findings: YES, 12,500 events/sec achieved)
#   src/producer.js, src/consumer.js
#   tests/benchmark.js
#   results/performance-baseline.txt
# Decision: GO - use Redis Streams for event processing
```

**Example 3: Technology Evaluation**
```
/build-poc "Can Next.js handle our SSR + ISR requirements?" --scope comprehensive

# Output:
# poc-nextjs-ssr/
#   README.md (findings: PARTIAL, SSR works, ISR has edge case issue)
#   src/ (multi-page Next.js app)
#   tests/ (unit + integration tests)
#   results/ (findings, performance baseline, edge case documentation)
# Decision: ALTERNATIVE - Use SSR only, defer ISR to later
```

---

**Command Version**: 1.0
**Phase**: Any (Elaboration most common)
**Duration**: 4 hours - 3 days (time-boxed)
**Output**: PoC code + README + decision

## References

- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/vague-discretion.md — PoC completion criteria must be concrete and time-boxed; avoid "done when working" conditions
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/human-authorization.md — Seek authorization before deciding the PoC outcome changes the architectural approach
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Research technology and existing patterns before implementation
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/skills/security-audit/SKILL.md — Consider security during PoC if the risk involves authentication or data handling
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/anti-laziness.md — Never skip testing or weaken PoC validation criteria to declare success faster

Related Skills

customize-rebuild

104

from jmagly/aiwg

Rebuild and redeploy AIWG from local customization source — makes recent edits live

skill-builder

104

from jmagly/aiwg

Build Claude skills from extracted documentation. Use after doc-scraper/pdf-extractor to generate uploadable skill packages.

Codex

corpus-index-build

104

from jmagly/aiwg

Build graph indices (by-topic, by-year, authors, citation-network) from corpus state using definitions in .aiwg/config.yaml. Replaces manual 3-agent dispatch with a single command.

Codex

build-artifact-index

104

from jmagly/aiwg

Build or rebuild the SDLC artifact index for agent-navigable discovery

Codex

aiwg-orchestrate

104

from jmagly/aiwg

Route structured artifact work to AIWG workflows via MCP with zero parent context cost

venv-manager

104

from jmagly/aiwg

Create, manage, and validate Python virtual environments. Use for project isolation and dependency management.

pytest-runner

104

from jmagly/aiwg

Execute Python tests with pytest, supporting fixtures, markers, coverage, and parallel execution. Use for Python test automation.

vitest-runner

104

from jmagly/aiwg

Execute JavaScript/TypeScript tests with Vitest, supporting coverage, watch mode, and parallel execution. Use for JS/TS test automation.

eslint-checker

104

from jmagly/aiwg

Run ESLint for JavaScript/TypeScript code quality and style enforcement. Use for static analysis and auto-fixing.

repo-analyzer

104

from jmagly/aiwg

Analyze GitHub repositories for structure, documentation, dependencies, and contribution patterns. Use for codebase understanding and health assessment.

pr-reviewer

104

from jmagly/aiwg

Review GitHub pull requests for code quality, security, and best practices. Use for automated PR feedback and approval workflows.

YouTube Acquisition

104

from jmagly/aiwg

yt-dlp patterns for acquiring content from YouTube and video platforms