Exploitability Validation Skill

A multi-stage pipeline for validating that vulnerability findings are real, reachable, and exploitable.

25 stars

Best use case

Exploitability Validation Skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

A multi-stage pipeline for validating that vulnerability findings are real, reachable, and exploitable.

Teams using Exploitability Validation Skill should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/exploitability-validation/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/gadievron/raptor/exploitability-validation/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/exploitability-validation/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Exploitability Validation Skill Compares

Feature / AgentExploitability Validation SkillStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

A multi-stage pipeline for validating that vulnerability findings are real, reachable, and exploitable.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Exploitability Validation Skill

A multi-stage pipeline for validating that vulnerability findings are real, reachable, and exploitable.

## Purpose

Prevents wasted effort on:
- Hallucinated findings (file doesn't exist, code doesn't match)
- Unreachable code paths (dead code, test-only)
- Findings with unrealistic preconditions

## When to Use

After scanning produces findings, BEFORE exploit development:
1. Scanner finds potential vulnerability
2. **This skill validates it's real and reachable**
3. Exploit Feasibility checks binary constraints
4. Exploit development proceeds

---

## [CONFIG] Configuration

```yaml
models:
  native: true
  additional: false  # Set true to also run GPT, Gemini

output_when_additional:
  display: "agreement: 2/3"
  threshold: "1/3 is enough to proceed"
```

---

## [EXEC] Execution Rules

1. Run the full pipeline end-to-end.
2. Solve and fix any issues you encounter, unless you failed five times in a row, or need clarification.
3. Run on latest thinking/reasoning model available (verify model name).
4. Pipeline must be deterministic - if ran again, results should be the same.
5. After writing each JSON output file, validate it against the schema: `from packages.exploitability_validation.schemas import validate_checklist, validate_findings, validate_attack_tree, validate_attack_paths, validate_attack_surface, validate_disproven`.

---

## [GATES] MUST-GATEs

Rationale: Without these gates, models sample instead of checking all code, hedge with "if" and "maybe" instead of verifying, and miss exploitable findings.

**GATE-1 [ASSUME-EXPLOIT]:** Your goal is to discover real exploitable vulnerabilities. If you think something isn't - don't assume. First, investigate under the assumption that it is.

**GATE-2 [STRICT-SEQUENCE]:** Strictly follow instructions. If you think or try something else, or a new idea comes up, present the results of that analysis separately at the end. Always display the results of the strict criteria first, and only then display the results of the additional methods, if any.

**GATE-3 [CHECKLIST]:** Check pipeline, update checklist, and collect evidence of compliance to present at the end that you successfully executed all actions through these gates.

**GATE-4 [NO-HEDGING]:** If your Chain-of-Thought or results include "if", "maybe", "uncertain", "unclear", "could potentially", "may be possible", "depending on", "in theory", "in certain circumstances", or similar - immediately verify the claim. Do not leave unverified.

**GATE-5 [FULL-COVERAGE]:** Test the entire code provided (file(s)/code base) against checklist.json, ensuring you checked all functions and lines of code. Do not sample, estimate, or guess.

**GATE-6 [PROOF]:** Always provide proof and show the vulnerable code.

**GATE-7 [CONSISTENCY]:** Before finalizing each finding, verify that `vuln_type`, `severity`, and `status` are consistent with the `description` and `proof` text. A description that explains why a bug is benign must not carry high severity.

---

## [STYLE] Output Formatting

**Status values in JSON must be snake_case:**
- `exploitable` not `EXPLOITABLE` or `Exploitable`
- `confirmed` not `CONFIRMED` or `Confirmed`
- `ruled_out` not `RULED_OUT` or `Ruled Out`
- `disproven` not `DISPROVEN` or `Disproven`

Title Case is for human-readable display (validation-report.md, terminal output) only. The orchestrator's `STATUS_DISPLAY` dict handles the conversion.

**No colored circles or emojis:**
- Do not use 🔴/🟡/🟢 - they are perspective-dependent (red = bad for defenders, good for researchers)
- Use plain text headers: `### Exploitable (7 findings)` not `### 🔴 EXPLOITABLE`

**Hypothesis status:**
- `Proven` - hypothesis confirmed by evidence
- `Disproven` - hypothesis refuted by evidence
- `Partial` - some predictions confirmed, others refuted

---

## [REMIND] Critical Reminders

- Do not skip, sample, or guess - check all code against checklist.json.
- Provide proof for every claim.
- Actually read files - do not rely on memory.
- Update docs after every action.

---

## Stages

**All stages execute in sequence. No stage may be skipped.**

| Stage | Purpose | Gates | Output |
|-------|---------|-------|--------|
| **0: Inventory** | Build ground truth checklist | - | checklist.json |
| **A: One-Shot** | Quick exploitability + PoC | 1, 4, 6 | findings.json |
| **B: Process** | Systematic analysis, attack trees | All (1-7) | 5 working docs |
| **C: Sanity** | Validate against actual code | 3, 5, 6 | validated findings.json |
| **D: Ruling** | Filter preconditions/hedging | 3, 5, 6, 7 | confirmed findings.json |
| **E: Feasibility** | Binary constraint analysis | 6 | final findings.json |
| **F: Review** | Self-review before finalizing | 7 | updated outputs |

**Note:** Stage E only applies to memory corruption vulnerabilities (buffer overflow, format string, UAF, etc.). For web/injection vulnerabilities, skip Stage E.

See stage-specific files for detailed instructions.

---

## Working Documents (Stage B)

| Doc | Purpose |
|-----|---------|
| attack-tree.json | Knowledge graph. Source of truth. |
| hypotheses.json | Active hypotheses. Status: testing, confirmed, disproven. |
| disproven.json | Failed hypotheses. What was tried, why it failed. |
| attack-paths.json | Paths attempted. PoC results. PROXIMITY. Blockers. |
| attack-surface.json | Sources, sinks, trust boundaries. |

---

## Flow

```
STAGE 0: Inventory
         │
         ▼ checklist.json
         │
STAGE A: One-Shot Analysis
         │
         ▼ findings.json (status: pending/not_disproven)
         │
STAGE B: Process
         │
         ├─► attack-surface.json (sources, sinks, boundaries)
         ├─► attack-tree.json (knowledge graph)
         ├─► hypotheses.json (testable predictions)
         ├─► disproven.json (failed approaches)
         └─► attack-paths.json (PROXIMITY scores)
         │
         ▼
STAGE C: Sanity Check
         │ (file exists? code verbatim? flow real?)
         │
         ▼ findings.json (sanity_check added)
         │
STAGE D: Ruling
         │ (apply Stage B evidence, make final status)
         │
         ▼ findings.json (ruling, final_status added)
         │
    ┌────┴────┐
    │         │
    ▼         ▼
 Memory    Web/Injection
 Corruption    │
    │          │
    ▼          │
STAGE E:       │
Feasibility    │
    │          │
    └────┬─────┘
         │
         ▼
STAGE F: Self-Review
         │ (what did I get wrong?)
         │
         ▼
    FINAL OUTPUT
    + validation-report.md
```

---

## Integration with Exploit Feasibility

Stage E automatically bridges to the `exploit_feasibility` package for memory corruption vulnerabilities.

**Automatic (via Stage E):**
```python
# Stage E handles this automatically for applicable vuln types
# See stage-e-feasibility.md for details
```

**Manual (if needed):**
```python
from packages.exploit_feasibility import analyze_binary, format_analysis_summary

result = analyze_binary(binary_path, vuln_type='format_string')
print(format_analysis_summary(result, verbose=True))
```

**Final Status After Stage E:**

| Source Status | Feasibility | Final Status |
|--------------|-------------|--------------|
| Confirmed | Likely | **Exploitable** |
| Confirmed | Difficult | **Confirmed (Constrained)** |
| Confirmed | Unlikely | **Confirmed (Blocked)** |
| Confirmed | N/A (web vuln) | **Confirmed** |

This ensures findings are:
1. **Real and reachable** (Stages A-D)
2. **Actually exploitable** (Stage E + exploit_feasibility)

---

## Notice

This analysis is performed for defensive purposes, in a lab environment. Full permission has been provided.

Related Skills

scanning-input-validation-practices

25
from ComeOnOliver/skillshub

This skill enables Claude to automatically scan source code for potential input validation vulnerabilities. It identifies areas where user-supplied data is not properly sanitized or validated before being used in operations, which could lead to security exploits like SQL injection, cross-site scripting (XSS), or command injection. Use this skill when the user asks to "scan for input validation issues", "check input sanitization", "find potential XSS vulnerabilities", or similar requests related to securing user input. It is particularly useful during code reviews, security audits, and when hardening applications against common web vulnerabilities. The skill leverages the input-validation-scanner plugin to perform the analysis.

input-validation-checker

25
from ComeOnOliver/skillshub

Input Validation Checker - Auto-activating skill for Security Fundamentals. Triggers on: input validation checker, input validation checker Part of the Security Fundamentals skill category.

cross-validation-setup

25
from ComeOnOliver/skillshub

Cross Validation Setup - Auto-activating skill for ML Training. Triggers on: cross validation setup, cross validation setup Part of the ML Training skill category.

deployment-validation-config-validate

25
from ComeOnOliver/skillshub

You are a configuration management expert specializing in validating, testing, and ensuring the correctness of application configurations. Create comprehensive validation schemas, implement configurat

global-validation

25
from ComeOnOliver/skillshub

Implement server-side validation with allowlists, specific error messages, type checking, and sanitization to prevent security vulnerabilities and ensure data integrity. Use this skill when creating or editing form request classes, when validating API inputs, when implementing validation rules in controllers or services, when writing client-side validation for user experience, when sanitizing user input to prevent injection attacks, when validating business rules, when implementing error message display, or when ensuring consistent validation across all application entry points.

zod-validation-patterns

25
from ComeOnOliver/skillshub

This skill provides comprehensive patterns for using Zod validation library in TypeScript applications. It ensures input validation is done correctly, securely, and consistently across the codebase.

data-validation

25
from ComeOnOliver/skillshub

Use when implementing data validation for API payloads, form inputs, or database writes. Triggers for: Pydantic models, Zod schemas, input sanitization, type validation, field constraints, or request/response schemas. NOT for: business logic (use domain services) or authentication/authorization.

code-validation-sandbox

25
from ComeOnOliver/skillshub

Validate code examples across the 4-Layer Teaching Method with intelligent strategy selection. Use when validating Python/Node/Rust code in book chapters. NOT for production deployment testing.

Pandera — Data Validation for DataFrames

25
from ComeOnOliver/skillshub

## Overview

Instructor — Structured LLM Output with Validation

25
from ComeOnOliver/skillshub

You are an expert in Instructor, the library for getting structured, validated output from LLMs. You help developers extract typed data from unstructured text using Pydantic models (Python) or Zod schemas (TypeScript), with automatic retries on validation failures, streaming partial objects, and support for OpenAI, Anthropic, Google, and local models — turning LLMs into reliable data extraction engines.

realphonevalidation-automation

25
from ComeOnOliver/skillshub

Automate Realphonevalidation tasks via Rube MCP (Composio). Always search tools first for current schemas.

google-address-validation-automation

25
from ComeOnOliver/skillshub

Automate Google Address Validation tasks via Rube MCP (Composio). Always search tools first for current schemas.