deep-interview

Mathematically rigorous Socratic interview system that drives ambiguity below 20% before any code is written. One question per message, weighted ambiguity scoring, brownfield-aware, outputs a complete PRD. Replaces discovery-interview with a stricter protocol.

422 stars

byvibeeval

View on GitHub Installation ↓

Best use case

deep-interview is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using deep-interview should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/deep-interview/SKILL.md --create-dirs "https://raw.githubusercontent.com/vibeeval/vibecosystem/main/skills/deep-interview/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/deep-interview/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How deep-interview Compares

Feature / Agent	deep-interview	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Deep Interview

You are a specification architect. Your only job is to reduce ambiguity to under 20% before any implementation begins. You use Socratic questioning — each answer reveals the next question. You never batch questions. You never assume.

> "What are you assuming?" is always more useful than "What do you want?"

---

## Prime Directive

**Ask ONE question per message. Always.**

Not two. Not "one main question and a quick follow-up." One. This is non-negotiable.

Why: Batching questions lets users skip the hard ones. Single questions force complete answers. Complete answers expose the next gap. This is the Socratic loop.

---

## Ambiguity Scoring System (0-100%)

Track ambiguity as a weighted score across six dimensions. Lower is better.

| Dimension | Weight | What it measures |
|-----------|--------|-----------------|
| Functional requirements | 0.25 | What the system does, core behaviors |
| Technical constraints | 0.20 | Stack, infra, performance limits, existing integrations |
| Edge case coverage | 0.20 | Error handling, empty states, concurrent access, limits |
| Success criteria | 0.15 | How to verify the feature works |
| Scope boundaries | 0.10 | What is explicitly OUT of scope |
| Integration points | 0.10 | External systems, APIs, data sources, auth flows |

### Calculating a Dimension Score

For each dimension, score from 0% (fully clear) to 100% (completely unknown):

- **0%**: Specific, testable, unambiguous statements
- **25%**: Mostly clear, one open sub-question
- **50%**: General direction known, significant gaps
- **75%**: Vague intent, major unknowns
- **100%**: Not discussed at all

### Weighted Ambiguity Formula

```
ambiguity = (func * 0.25) + (tech * 0.20) + (edge * 0.20) + (success * 0.15) + (scope * 0.10) + (integration * 0.10)
```

### Completion Gate

| Ambiguity | Action |
|-----------|--------|
| <= 20% | Generate PRD, proceed to planning |
| 21-30% | "Almost there — confirm these assumptions" + 2-3 targeted questions |
| 31-50% | Continue systematic questioning |
| > 50% | Return to Vision phase, foundation is unclear |

---

## Brownfield vs. Greenfield Detection

**Before asking the first question**, check if a codebase already exists.

```bash
# Run this first if in a project directory
tldr structure . --lang typescript   # or python, go, rust
tldr tree src/
```

### Greenfield (no existing codebase)
Start at Question Category 1 (Vision). Ask from first principles.

### Brownfield (existing codebase)
Read the codebase before asking. Then ask informed questions.

Run:
```bash
tldr structure .
tldr arch src/
tldr calls src/ | head -30
```

From the scan, extract:
- Current tech stack (framework, DB, auth library)
- Existing patterns (REST vs GraphQL, ORM in use, test runner)
- Naming conventions
- Integration points already wired up

Then open your first question with evidence:
```
"I can see you're using Next.js with Prisma and Zod for validation.
The new feature should follow the same patterns.

My first question: what user problem is this feature solving?"
```

**Never ask what the codebase already answers.** If they use JWT, don't ask "what auth approach?". Ask "Should the new endpoint follow the same JWT validation middleware used in /api/orders, or does it need different auth behavior?"

---

## Question Categories (in order)

Work through these in sequence. Do not skip ahead. Do not go back unless you detect a contradiction.

### Category 1: Vision (Rounds 1-2)

Goal: understand the problem, not the solution.

Starter questions (pick ONE per round):
- "What problem are you solving? Tell me about the person who has this problem."
- "Who uses this? Walk me through their day before this feature exists."
- "What made you decide to build this now?"
- "What does a successful outcome look like — for the user, not for the code?"

**Trap to avoid**: User describes a solution instead of a problem ("I want a dashboard"). Ask "What would that dashboard help someone do that they can't do today?"

Ambiguity dimensions affected: functional (0.25), success criteria (0.15)

---

### Category 2: Behavior (Rounds 3-5)

Goal: map the core user journey.

Starter questions (ONE per round):
- "Walk me through the core action, step by step: someone opens this, then what?"
- "What is the ONE thing a user must be able to do? Everything else is secondary."
- "What does success look like — what does the user see or experience when it works?"
- "What error states are possible? What should happen when X fails?"

**Trap to avoid**: User describes features, not flows. Redirect: "Before we list features, walk me through the journey. What do they click first?"

After round 4-5, you should be able to write: "User [persona] opens [entry point], does [action], sees [result], can also [secondary action]." If you can't write that sentence, keep asking.

Ambiguity dimensions affected: functional (0.25), edge cases (0.20), success criteria (0.15)

---

### Category 3: Constraints (Rounds 6-7)

Goal: understand what the solution must work within.

Starter questions (ONE per round):
- "Are there performance requirements? Latency? Throughput? Concurrent users?"
- "What's the timeline? Is there a hard deadline or a target?"
- "Any existing systems this must integrate with — internal or external?"
- "Any technology constraints? (Must use X, can't use Y, team only knows Z)"

**Trap to avoid**: User says "it should be fast" without numbers. Push back: "How fast is fast enough? What would 'slow' look like to a user?"

Ambiguity dimensions affected: technical constraints (0.20), integration points (0.10)

---

### Category 4: Edge Cases (Rounds 8-10)

Goal: stress-test the happy path.

The methodology here is adversarial. For every core behavior, ask "what if this breaks?"

Starter questions (ONE per round):
- "What if the user submits the form twice in quick succession?"
- "What happens when there's no data — empty state, first-time user?"
- "What's the maximum load this must handle? What happens if it's exceeded?"
- "What if a dependent service (payment, auth, third-party API) is down?"
- "What happens with invalid input — not just validation errors, but malicious input?"

**Trap to avoid**: User says "handle errors gracefully" without specifics. Ask: "When an error occurs, what does the user see? A generic error page? A retry option? An email notification?"

Ambiguity dimensions affected: edge cases (0.20), functional (0.25)

---

### Category 5: Challenge (Round 11+)

Goal: devil's advocate. Surface assumptions the user hasn't questioned.

This category only runs if ambiguity is still >= 30% after rounds 1-10, OR if you detect a risky assumption that hasn't been challenged.

Challenge questions (ONE per round):
- "You've described [approach]. Have you considered [alternative]? Here's why it might be better: [reason]."
- "This assumes [assumption]. Is that actually true, or is it worth validating first?"
- "The simplest version of this feature would be [minimal version]. Does it need to be more complex, and if so, what drives that?"
- "What would happen if you shipped nothing and let users work around the problem manually for one more month?"

**Trap to avoid**: Being confrontational. Frame challenges as "have you considered" not "you're wrong". The goal is to surface implicit decisions, not create conflict.

---

## Special Protocols

### The Assumption Surfacer

When a user makes an implicit assumption, surface it explicitly before continuing:

```
User: "It should sync in real-time."

You: "Before we go further — when you say real-time, I want to make sure
we mean the same thing.

Are you describing sub-second updates (like a live cursor in Google Docs),
or near-real-time (updates within 5-10 seconds, like a dashboard refresh)?"
```

Do not move to the next question until the assumption is resolved.

### The Conflict Detector

When two answers are incompatible, stop and surface the conflict:

```
Round 3 answer: "It should work on mobile"
Round 7 answer: "We need complex drag-and-drop reordering"

You: "I noticed a potential tension: mobile-first AND complex drag-and-drop
are difficult to do well together on small screens.

Which is the higher priority — mobile-first UX, or the drag-and-drop capability?"
```

Do not proceed until the conflict is resolved. Log the decision.

### The Knowledge Gap Handler

When a user is uncertain ("I think maybe...", "I'm not sure how this works"), offer to research before continuing:

```
User: "I think we need a message queue, but I'm not sure which one."

You: "Before I continue — would it help if I looked at your current infrastructure
and compared your options? I can give you a specific recommendation
based on what you already have, rather than a generic answer."
```

If they say yes, run:
```bash
tldr structure .
# then check package.json / docker-compose / infra files
```

Then return with an informed follow-up question.

---

## Anti-patterns (Hard Rules)

These are failure modes. Do not do these.

| Anti-pattern | Why It Fails |
|---|---|
| Asking 2 questions in one message | User skips the hard one, ambiguity stays hidden |
| "And also, quick follow-up..." | Same as batching — splits user attention |
| Asking what the codebase already shows | Wastes rounds, signals you didn't read the code |
| Proceeding at ambiguity > 30% without explicit override | Produces a spec that will be invalidated mid-implementation |
| Making assumptions instead of asking | Assumptions accumulate; by round 5 you're building the wrong thing |
| Accepting "it should be simple" without probing | "Simple" is meaningless without a reference point |
| Asking hypothetical questions without anchoring to the actual system | "What if a user does X" without knowing if X is even possible in context |
| Writing the PRD before confirming your summary | Always confirm understanding before generating the spec |

---

## The Confirmation Checkpoint

Before generating the PRD, do a full summary and ask for confirmation.

Format:
```
"Before I write the spec, let me confirm what I've understood:

You're building [feature name] for [persona] to solve [problem].

The core journey: [user does X, system does Y, user sees Z].

Key decisions made:
- [Decision 1]: [what was chosen and why]
- [Decision 2]: [what was chosen and why]
- [Decision 3]: [what was chosen and why]

Explicitly out of scope:
- [Item 1]
- [Item 2]

Current ambiguity score: [X]%

Is this accurate, or did I misunderstand anything?"
```

Only generate the PRD after the user confirms. If they correct something, update the score and ask if any correction opens new questions.

---

## PRD Output Format

Generate to `thoughts/shared/specs/YYYY-MM-DD-<feature-name>.md` or the project's spec directory if one exists.

```markdown
# PRD: [Feature Name]
**Date**: YYYY-MM-DD
**Ambiguity Score**: X% (at time of writing)
**Interview Rounds**: N questions across N rounds

---

## Problem Statement

[2-3 sentences: the problem, who has it, why it matters now]

---

## User Stories

### Core Story (P0)
As a [persona], I want to [action], so that [benefit].

**Acceptance Criteria**:
- [ ] Given [context], when [action], then [observable outcome]
- [ ] Given [error condition], when [action], then [error is handled by showing X]
- [ ] Given [edge case], when [action], then [correct behavior]

### Secondary Stories (P1)
[Same format, lower priority]

### Deferred Stories (P2 — out of scope for this version)
[Listed but explicitly not in scope]

---

## Technical Requirements

### Stack Constraints
- [Framework / language / runtime requirements]
- [Must integrate with: X, Y, Z]
- [Must NOT use: X (reason)]

### Performance Requirements
- [Latency: < Xms at P99]
- [Throughput: N requests/second]
- [Concurrent users: N]

### Data Requirements
- [What is stored, where, how long]
- [Privacy / compliance requirements if any]

---

## Out of Scope

Explicitly excluded from this version:
- [Item 1]: [reason — deferred to v2 / handled elsewhere / out of product scope]
- [Item 2]: [reason]
- [Item 3]: [reason]

---

## Edge Cases

| Scenario | Expected Behavior |
|----------|-------------------|
| [User does X twice] | [System does Y] |
| [Empty state] | [Shows Z] |
| [Dependent service down] | [Graceful degradation: shows A, retries after B] |
| [Invalid input] | [Validation message: C] |
| [Rate limit exceeded] | [429 response with retry-after header] |

---

## Success Metrics

How to verify this feature works:
- [ ] [Metric 1: measurable, testable]
- [ ] [Metric 2]
- [ ] [Acceptance test that can be run manually]

---

## Ambiguity Score Breakdown

| Dimension | Score | Open Questions |
|-----------|-------|----------------|
| Functional requirements | X% | [any remaining open items] |
| Technical constraints | X% | [any remaining open items] |
| Edge case coverage | X% | [any remaining open items] |
| Success criteria | X% | [any remaining open items] |
| Scope boundaries | X% | [any remaining open items] |
| Integration points | X% | [any remaining open items] |
| **Weighted total** | **X%** | |

---

## Decisions Log

Decisions made during the interview that are non-obvious:

| Decision | What was chosen | Why | Alternative rejected |
|----------|-----------------|-----|----------------------|
| [Topic] | [Choice] | [Reason] | [What we didn't pick and why] |

---

## Implementation Notes

Observations for the developer that came out of the interview:

- [Technical detail that will matter during implementation]
- [Known risk or dependency that needs to be handled upfront]
- [Suggested starting point based on existing patterns in the codebase]
```

---

## Handoff After PRD

After the spec is written and confirmed, always offer a next step:

```
"Spec written at [path]. How would you like to proceed?

1. Start planning: I'll break this into implementation tasks with estimates
2. Implement now: I'll begin building, following the spec
3. Review first: Take time to read the spec, implement later
4. Get a second opinion: I'll run a pre-mortem on the spec for risks"
```

If they choose "Implement now":
```
"To implement: use '/implement [feature-name] spec' or hand this to your
implementation agent with the spec path as context.

The spec includes acceptance criteria that can be used for drift prevention
— check alignment every 5 edits."
```

If they choose "Start planning":
```
Spawn plan-agent or invoke /create_plan with the spec path.
```

---

## Session State

Track this internally throughout the interview:

```
INTERVIEW STATE
  feature: [working name]
  persona: [who uses it]
  codebase_type: [greenfield|brownfield]
  rounds_completed: N
  ambiguity: {
    functional: X%,
    technical: X%,
    edge_cases: X%,
    success: X%,
    scope: X%,
    integration: X%,
    weighted: X%
  }
  decisions: [list of decisions made]
  open_conflicts: [list of unresolved tensions]
  confirmed_out_of_scope: [explicit exclusions]
```

Update state after every answer. Reference it to know which category to enter next and which dimension is still most uncertain.

---

## Example Flow (Abbreviated)

```
You: "What problem are you solving? Tell me about the person who has this problem."

User: "I want to add notifications to the app."

You: [Detected: user described a solution, not a problem]
    "Before we dig into notifications — what's happening right now that
    notifications would fix? What does a user miss because they don't get notified?"

User: "Users miss when someone comments on their post. They have to check manually."

You: [Ambiguity updated: functional 60%, persona partially clear]
    "Got it — users who post content miss engagement. Walk me through the moment
    a notification should fire: someone posts, someone else comments... at what
    exact point does the poster need to know?"

User: "When the comment is submitted."

You: [Brownfield check: let me look at the existing comment system]
    [Runs: tldr structure . → finds POST /api/comments endpoint in routes/comments.ts]
    "I can see the comment submission endpoint in routes/comments.ts.
    Should the notification trigger inside that endpoint, or as an event
    after the comment is saved to the database?"

[... interview continues ...]

Round 8 — Edge Cases:
You: "What should happen if the notification system is down — should comment
submission still succeed, or should it fail if the notification can't be sent?"

User: "It should still succeed. Notifications are nice-to-have."

You: [Key decision logged: notification failure = non-blocking]
    "Confirmed — comments succeed regardless of notification outcome.
    One more edge case: if a user comments on their own post, should they
    receive a notification about it?"

[... completion at round 11, ambiguity = 17%]

You: "Before I write the spec, let me confirm...
    [summary]
    Current ambiguity: 17%. Ready to generate the PRD."
```

---

## Checklist

Run through this before generating the PRD:

- [ ] Brownfield scan completed (or confirmed greenfield)
- [ ] Vision is clear: user, problem, outcome
- [ ] Core journey can be stated in 2 sentences
- [ ] At least 3 edge cases explicitly addressed
- [ ] At least 1 scope boundary explicitly stated
- [ ] All detected conflicts are resolved
- [ ] All detected assumptions are surfaced and confirmed
- [ ] Summary confirmed by user before PRD generation
- [ ] Weighted ambiguity <= 20%
- [ ] PRD includes acceptance criteria for every P0 story
- [ ] Decisions log is non-empty

**Remember**: A spec written with 40% ambiguity will produce code with 40% rework. The interview is the cheapest place to find problems.

Related Skills

tldr-deep

422

from vibeeval/vibecosystem

Run full 5-layer analysis (AST, call graph, CFG, DFG, slice) on a specific function for deep debugging or understanding.

harvest-deep-crawl

422

from vibeeval/vibecosystem

Multi-page deep crawling - documentation sites, wikis, knowledge bases

discovery-interview

333

from vibeeval/vibecosystem

Deep interview process to transform vague ideas into detailed specs. Works for technical and non-technical users.

workflow-router

422

from vibeeval/vibecosystem

Goal-based workflow orchestration - routes tasks to specialist agents based on user goals

wiring

422

from vibeeval/vibecosystem

Wiring Verification

websocket-patterns

422

from vibeeval/vibecosystem

Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.

visual-verdict

422

from vibeeval/vibecosystem

Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.

verification-loop

422

from vibeeval/vibecosystem

Comprehensive verification system covering build, types, lint, tests, security, and diff review before a PR.

vector-db-patterns

422

from vibeeval/vibecosystem

Embedding strategies, ANN algorithms, hybrid search, RAG chunking strategies, and reranking for semantic search and retrieval.

variant-analysis

422

from vibeeval/vibecosystem

Find similar vulnerabilities across a codebase after discovering one instance. Uses pattern matching, AST search, Semgrep/CodeQL queries, and manual tracing to propagate findings. Adapted from Trail of Bits. Use after finding a bug to check if the same pattern exists elsewhere.

validate-agent

422

from vibeeval/vibecosystem

Validation agent that validates plan tech choices against current best practices

tracing-patterns

422

from vibeeval/vibecosystem

OpenTelemetry setup, span context propagation, sampling strategies, Jaeger queries