AI Agent Skill HUB

pentest-validation

Use when validating security findings from SAST/DAST scans, proving exploitability of reported vulnerabilities, eliminating false positives, or running the 4-phase pentest pipeline (recon, analysis, validation, report).

298 stars

byproffesor-for-testing

View on GitHub Installation ↓

Best use case

pentest-validation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Use when validating security findings from SAST/DAST scans, proving exploitability of reported vulnerabilities, eliminating false positives, or running the 4-phase pentest pipeline (recon, analysis, validation, report).

Teams using pentest-validation should expect a more consistent output, faster repeated execution, less prompt rewriting, better workflow continuity with your supporting tools.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.
You already have the supporting tools or dependencies needed by this skill.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/pentest-validation/SKILL.md --create-dirs "https://raw.githubusercontent.com/proffesor-for-testing/agentic-qe/main/.claude/skills/pentest-validation/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/pentest-validation/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How pentest-validation Compares

Feature / Agent	pentest-validation	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Use when validating security findings from SAST/DAST scans, proving exploitability of reported vulnerabilities, eliminating false positives, or running the 4-phase pentest pipeline (recon, analysis, validation, report).

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

AI Agent for SaaS Idea Validation

Use AI agent skills for SaaS idea validation, market research, customer discovery, competitor analysis, and documenting startup hypotheses.

SKILL.md Source

# Pentest Validation

<default_to_action>
When validating security findings:
1. REQUIRE explicit authorization for target URL
2. SCAN with qe-security-scanner (SAST + dependency + secrets)
3. ANALYZE with qe-security-reviewer + qe-security-auditor (parallel)
4. VALIDATE with qe-pentest-validator (graduated exploitation, parallel per vuln type)
5. REPORT only confirmed findings with PoC evidence ("No Exploit, No Report")
6. UPDATE exploit playbook with new patterns

**Quality Gates:**
- Authorization confirmed before ANY exploitation
- Target URL is staging/dev (NOT production)
- Budget cap enforced ($15 default)
- Time cap enforced (30 min default)
- All exploitation attempts logged
</default_to_action>

## Quick Reference Card

### The 4-Phase Pipeline

| Phase | Agent(s) | Purpose | Parallelism |
|-------|----------|---------|-------------|
| **1. Recon** | qe-security-scanner | SAST, DAST, dependency scan, secrets | Internal parallel |
| **2. Analysis** | qe-security-reviewer + qe-security-auditor | Code review + compliance check | Both in parallel |
| **3. Validation** | qe-pentest-validator | Graduated exploit validation | Per-vuln-type parallel |
| **4. Report** | qe-quality-gate | "No Exploit, No Report" filter | Sequential |

### Graduated Exploitation Tiers

| Tier | Handler | Cost | Latency | Use When |
|------|---------|------|---------|----------|
| **1** | Agent Booster (WASM) | $0 | <1ms | Code pattern is conclusive (eval, innerHTML, hardcoded creds) |
| **2** | Haiku | $0.0002 | ~500ms | Need payload test against live target |
| **3** | Sonnet/Opus | $0.003-$0.015 | 2-5s | Full exploit chain with data proof |

### When to Use This Skill

| Scenario | Tier | Estimated Cost |
|----------|------|----------------|
| PR security review (source only) | 1 | $0 |
| Pre-release validation (staging) | 1-2 | $1-5 |
| Full pentest validation | 1-3 | $5-15 |
| Compliance audit evidence | 1-3 | $5-15 |

---

## Configuration

```yaml
pentest:
  target_url: https://staging.app.com    # REQUIRED for Tier 2-3
  source_repo: ./src                      # REQUIRED for Tier 1+
  exploitation_tier: 2                    # 1=pattern-only, 2=payload-test, 3=full-exploit
  vuln_types:                             # Which pipelines to run
    - injection                           # SQL, NoSQL, command injection
    - xss                                 # Reflected, stored, DOM XSS
    - auth                                # Auth bypass, session, JWT
    - ssrf                                # URL scheme abuse, metadata
  max_cost_usd: 15                        # Budget cap per run
  timeout_minutes: 30                     # Time cap per run
  require_authorization: true             # MUST confirm target ownership
  no_production: true                     # Block production URLs
  production_patterns:                    # URL patterns to block
    - "*.prod.*"
    - "api.*"
    - "www.*"
```

---

## Safeguards (Mandatory)

### Authorization Gate
Every pentest validation run MUST:
1. Display target URL and exploitation tier to user
2. Require explicit confirmation: "I own/authorized testing of this target"
3. Log authorization with timestamp
4. Block if target URL matches production patterns

### What This Skill Does NOT Do
- Full autonomous reconnaissance (Nmap, Subfinder)
- Zero-day exploit development
- Attack targets without explicit authorization
- Test production systems
- Store actual exfiltrated data (only proof of access)
- Social engineering or phishing simulation
- Port scanning or service discovery

---

## Validation Pipelines

### Injection Pipeline
| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) |
|--------|-------------------|-------------------|----------------|
| SQL injection | String concat in query | `' OR '1'='1` response diff | UNION SELECT data extraction |
| NoSQL injection | `$where`, `$gt` in query | Operator injection test | Collection enumeration |
| Command injection | `exec()`, `system()` calls | Command delimiter test | Reverse shell proof |
| LDAP injection | String concat in filter | Wildcard injection | Directory enumeration |

### XSS Pipeline
| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) |
|--------|-------------------|-------------------|----------------|
| Reflected XSS | No output encoding | `<img onerror>` reflection | Browser JS execution via Playwright |
| Stored XSS | `innerHTML` assignment | Payload stored + retrieved | Cookie theft PoC |
| DOM XSS | `document.write(location)` | Fragment injection | DOM manipulation proof |

### Auth Pipeline
| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) |
|--------|-------------------|-------------------|----------------|
| JWT none | No algorithm validation | Modified JWT accepted | Admin access with forged token |
| Session fixation | No session rotation | Pre-set session reused | Cross-user session hijack |
| Credential stuffing | No rate limiting | 100 attempts unblocked | Valid credential discovery |
| IDOR | No authorization check | Access other user data | Full CRUD on foreign resources |

### SSRF Pipeline
| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) |
|--------|-------------------|-------------------|----------------|
| Internal URL | User-controlled URL fetch | `http://169.254.169.254` | Cloud metadata extraction |
| DNS rebinding | URL validation bypass | Rebind to internal IP | Internal service access |
| Protocol smuggling | URL scheme not restricted | `file:///etc/passwd` | File content in response |

---

## Agent Coordination

### Orchestration Pattern
```typescript
// Phase 1: Recon (parallel scans)
await Task("Security Scan", {
  target: "./src",
  layers: { sast: true, dast: true, dependencies: true, secrets: true }
}, "qe-security-scanner");

// Phase 2: Analysis (parallel review)
await Promise.all([
  Task("Code Security Review", {
    findings: phase1Results,
    depth: "comprehensive"
  }, "qe-security-reviewer"),

  Task("Compliance Audit", {
    findings: phase1Results,
    frameworks: ["owasp-top-10"]
  }, "qe-security-auditor")
]);

// Phase 3: Validation (graduated exploitation)
await Task("Exploit Validation", {
  findings: [...phase1Results, ...phase2Results],
  target_url: "https://staging.app.com",
  exploitation_tier: 2,
  vuln_types: ["injection", "xss", "auth", "ssrf"],
  max_cost_usd: 15,
  timeout_minutes: 30
}, "qe-pentest-validator");

// Phase 4: Report ("No Exploit, No Report" gate)
await Task("Security Quality Gate", {
  findings: phase3Results.confirmedFindings,
  gate: "no-exploit-no-report",
  require_poc: true
}, "qe-quality-gate");
```

### Finding Classification
| Status | Meaning | Action |
|--------|---------|--------|
| `confirmed-exploitable` | Exploitation succeeded with PoC | Report with evidence |
| `likely-exploitable` | Partial exploitation, defenses detected | Report with caveats |
| `not-exploitable` | All exploitation attempts failed | Filter from report |
| `inconclusive` | WAF/defense blocked, unclear if vulnerable | Report for manual review |

---

## Exploit Playbook Memory

### Namespace Structure
```
aqe/pentest/
 playbook/
  exploit/{vuln_type}/{tech_stack}/{technique}
  bypass/{defense_type}/{technique}
  payload/{vuln_type}/{variant}
 results/
  validation-{timestamp}
 poc/
  {finding_id}-poc
```

### Learning Loop
1. **Before validation**: Query playbook for known patterns matching findings
2. **During validation**: Try known payloads first (higher success rate)
3. **After validation**: Store new successful patterns with confidence scores
4. **Over time**: Agent converges on most effective payloads per tech stack

---

## Cost Optimization

### Estimated Cost by Scenario
| Scenario | Tier Mix | Findings | Est. Cost | Est. Time |
|----------|----------|----------|-----------|-----------|
| PR check (source only) | 100% Tier 1 | 5 | $0 | <5s |
| Sprint validation | 70% T1, 30% T2 | 15 | $2-5 | 5-10 min |
| Release validation | 40% T1, 40% T2, 20% T3 | 25 | $8-15 | 15-30 min |
| Full pentest | 20% T1, 30% T2, 50% T3 | 40 | $15-30 | 30-60 min |

### Cost vs Shannon Comparison
| Metric | Shannon | AQE Pentest Validation |
|--------|---------|----------------------|
| Cost per run | ~$50 | $5-15 (graduated tiers) |
| Runtime | 60-90 min | 15-30 min (parallel pipelines) |
| False positive rate | Low (exploit-proven) | Low (same principle) |
| Learning | None (static prompts) | ReasoningBank playbook |

---

## Success Metrics

| Metric | Target | Measurement |
|--------|--------|-------------|
| False positive reduction | >60% of findings eliminated | Pre/post validator comparison |
| Exploit confirmation rate | >80% of confirmed findings truly exploitable | Manual PoC verification |
| Cost per run | <$15 USD | Token tracking per pipeline |
| Time per run | <30 minutes | Execution time metrics |
| Playbook growth | 100+ patterns after 6 months | Memory namespace count |

---

## Related Skills

- [security-testing](../security-testing/) - OWASP vulnerability scanning, SAST/DAST automation
- [compliance-testing](../compliance-testing/) - Regulatory compliance
- [api-testing-patterns](../api-testing-patterns/) - API security testing
- [chaos-engineering-resilience](../chaos-engineering-resilience/) - Security under chaos

---

## Remember

**"No Exploit, No Report."** A vulnerability scanner that can't prove exploitation delivers uncertain value. This skill transforms security findings from theoretical risks into proven vulnerabilities with evidence. Every confirmed finding comes with a reproducible proof-of-concept. Every false positive is eliminated before it reaches the report.

**Think proof, not prediction.** Don't report what MIGHT be vulnerable. Prove what IS vulnerable.

Related Skills

qe-security-testing

from proffesor-for-testing/agentic-qe

Test for security vulnerabilities using OWASP principles. Use when conducting security audits, testing auth, or implementing security practices.

qe-n8n-security-testing

from proffesor-for-testing/agentic-qe

Credential exposure detection, OAuth flow validation, API key management testing, and data sanitization verification for n8n workflows. Use when validating n8n workflow security.

security-testing

from proffesor-for-testing/agentic-qe

Scans for security vulnerabilities including XSS, SQL injection, CSRF, and auth flaws using OWASP Top 10 methodology. Use when conducting SAST/DAST scans, auditing authentication flows, testing authorization rules, or implementing security test automation.

n8n-security-testing

from proffesor-for-testing/agentic-qe

Credential exposure detection, OAuth flow validation, API key management testing, and data sanitization verification for n8n workflows. Use when validating n8n workflow security.

web-security-testing

from sickn33/antigravity-awesome-skills

Web application security testing workflow for OWASP Top 10 vulnerabilities including injection, XSS, authentication flaws, and access control issues.

api-security-testing

from sickn33/antigravity-awesome-skills

API security testing workflow for REST and GraphQL APIs covering authentication, authorization, rate limiting, input validation, and security best practices.

web-security-testing

from davila7/claude-code-templates

Web application security testing workflow for OWASP Top 10 vulnerabilities including injection, XSS, authentication flaws, and access control issues.

api-security-testing

from davila7/claude-code-templates

API security testing workflow for REST and GraphQL APIs covering authentication, authorization, rate limiting, input validation, and security best practices.

performing-soap-web-service-security-testing

from mukul975/Anthropic-Cybersecurity-Skills

Perform security testing of SOAP web services by analyzing WSDL definitions and testing for XML injection, XXE, WS-Security bypass, and SOAPAction spoofing.

performing-api-security-testing-with-postman

from mukul975/Anthropic-Cybersecurity-Skills

Uses Postman to perform structured API security testing by building collections that test for OWASP API Security Top 10 vulnerabilities including authentication bypass, authorization flaws, injection, and data exposure. The tester creates environments with multiple user roles, writes test scripts for automated security validation, and integrates Postman with OWASP ZAP and Newman for CI/CD security testing. Activates for requests involving Postman security testing, API security collection, automated API testing, or OWASP API testing with Postman.

implementing-api-security-testing-with-42crunch

from mukul975/Anthropic-Cybersecurity-Skills

Implement comprehensive API security testing using the 42Crunch platform to perform static audit and dynamic conformance scanning of OpenAPI specifications.

conducting-api-security-testing

from mukul975/Anthropic-Cybersecurity-Skills

Conducts security testing of REST, GraphQL, and gRPC APIs to identify vulnerabilities in authentication, authorization, rate limiting, input validation, and business logic. The tester uses the OWASP API Security Top 10 as the testing framework, combining Burp Suite interception with Postman collections and custom scripts to test endpoint security at every privilege level. Activates for requests involving API security testing, REST API pentest, GraphQL security assessment, or API vulnerability testing.