Best use case
AI Pentesting is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
## Overview
Teams using AI Pentesting should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/ai-pentesting/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How AI Pentesting Compares
| Feature / Agent | AI Pentesting | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
## Overview
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# AI Pentesting
## Overview
Use AI agents to autonomously conduct penetration tests on web applications. Combine LLM reasoning with security tools (nmap, subfinder, nuclei, sqlmap, browser automation) to find and prove vulnerabilities with minimal human intervention.
## Instructions
### Methodology
AI pentesting follows the same phases as human pentesting, but the AI orchestrates each phase autonomously:
```
Phase 1: RECONNAISSANCE
├── Subdomain enumeration (subfinder)
├── Technology fingerprinting (whatweb, wappalyzer)
├── Port scanning (nmap)
├── API schema discovery (crawling, OpenAPI/GraphQL introspection)
└── Source code analysis (if white-box)
AI decides: which tools to run, in what order, based on findings
Phase 2: VULNERABILITY ANALYSIS
├── Known CVE scanning (nuclei)
├── Web vulnerability scanning (OWASP ZAP, nikto)
├── API fuzzing (schemathesis)
├── Code-level vulnerability hunting (semgrep, CodeQL)
└── Data flow analysis (input → dangerous function)
AI decides: which findings are likely exploitable
Phase 3: EXPLOITATION
├── SQL injection (sqlmap, manual payloads)
├── XSS (reflected, stored, DOM)
├── SSRF (internal access, cloud metadata)
├── Authentication bypass (broken auth, privilege escalation)
├── Business logic flaws (price manipulation, race conditions)
└── Browser-based exploitation (Playwright/Puppeteer)
AI decides: exploitation order, payload selection, chaining
Phase 4: REPORTING
├── Proof-of-concept for each finding
├── Reproducible steps (curl commands, screenshots)
├── Severity rating (CVSS score)
├── Remediation guidance
└── Executive summary
AI generates: structured, evidence-based report
```
### Setting Up Shannon
Shannon is an open-source AI pentester that automates the full lifecycle:
```bash
# Clone and set up Shannon
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon
# Configure credentials
export ANTHROPIC_API_KEY="your-api-key"
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000
# Run a pentest against your application
# Requires: Docker, target URL, source code repo
./shannon start URL=https://your-app.com REPO=./your-repo
# Monitor progress
./shannon logs
# View results in Temporal UI
open http://localhost:8233
```
Shannon's architecture:
- **Reconnaissance agent**: Maps attack surface using nmap, subfinder, whatweb
- **Vulnerability agents**: Specialized per OWASP category (injection, XSS, SSRF, auth bypass)
- **Exploitation agent**: Uses browser automation to prove vulnerabilities with real exploits
- **Reporting agent**: Generates findings with copy-paste PoC commands
### Building a Custom AI Pentest Pipeline
For cases where Shannon doesn't fit, build a custom pipeline:
```python
# ai_pentester.py
# Custom AI pentesting pipeline using LLM + security tools
import subprocess
import json
from openai import OpenAI
client = OpenAI()
class AIPentester:
"""Autonomous AI penetration tester.
Orchestrates security tools using LLM reasoning
to find and prove vulnerabilities.
"""
def __init__(self, target_url: str, scope: list[str] = None):
self.target = target_url
self.scope = scope or [target_url]
self.findings = []
self.recon_data = {}
async def run_pentest(self) -> dict:
"""Execute full penetration test lifecycle.
Returns:
Dict with findings, evidence, and recommendations
"""
# Phase 1: Recon
self.recon_data = await self._recon()
# Phase 2: AI-guided vulnerability analysis
targets = await self._analyze_attack_surface(self.recon_data)
# Phase 3: AI-guided exploitation
for target in targets:
finding = await self._exploit(target)
if finding:
self.findings.append(finding)
# Phase 4: Generate report
report = await self._generate_report()
return report
async def _recon(self) -> dict:
"""Run reconnaissance tools and aggregate results."""
recon = {}
# Subdomain enumeration
result = subprocess.run(
['subfinder', '-d', self._get_domain(), '-silent'],
capture_output=True, text=True, timeout=120
)
recon['subdomains'] = result.stdout.strip().split('\n')
# Technology fingerprinting
result = subprocess.run(
['whatweb', self.target, '--log-json=/dev/stdout', '-a', '3'],
capture_output=True, text=True, timeout=60
)
recon['technologies'] = json.loads(result.stdout) if result.stdout else {}
# Port scanning
result = subprocess.run(
['nmap', '-sV', '--top-ports', '1000', '-oJ', '-', self._get_domain()],
capture_output=True, text=True, timeout=300
)
recon['ports'] = result.stdout
# Nuclei scan for known CVEs
result = subprocess.run(
['nuclei', '-u', self.target, '-severity', 'critical,high',
'-json', '-silent'],
capture_output=True, text=True, timeout=300
)
recon['known_vulns'] = [
json.loads(line) for line in result.stdout.strip().split('\n')
if line.strip()
]
return recon
async def _analyze_attack_surface(self, recon: dict) -> list:
"""Use AI to analyze recon data and prioritize attack targets."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content":
"You are an expert penetration tester. Analyze the "
"reconnaissance data and identify the most promising "
"attack vectors. Return JSON array of targets."},
{"role": "user", "content":
f"Recon data:\n{json.dumps(recon, indent=2)}\n\n"
"Identify attack targets with: endpoint, vulnerability_type, "
"technique, priority (1-5), reasoning."}
],
response_format={"type": "json_object"}
)
return json.loads(response.choices[0].message.content).get("targets", [])
async def _exploit(self, target: dict) -> dict | None:
"""Attempt to exploit an identified vulnerability."""
vuln_type = target.get('vulnerability_type', '').lower()
handlers = {
'injection': self._test_injection,
'xss': self._test_xss,
'ssrf': self._test_ssrf,
'auth': self._test_auth_bypass,
}
for key, handler in handlers.items():
if key in vuln_type:
return await handler(target)
return None
async def _generate_report(self) -> dict:
"""Generate a structured penetration test report."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content":
"Generate a professional penetration test report with "
"executive summary, findings with CVSS scores, PoC steps, "
"and remediation recommendations."},
{"role": "user", "content":
f"Target: {self.target}\n"
f"Findings: {json.dumps(self.findings, indent=2)}\n"
f"Recon data: {json.dumps(self.recon_data, indent=2)}"}
]
)
return {
"target": self.target,
"findings_count": len(self.findings),
"findings": self.findings,
"report": response.choices[0].message.content
}
```
### CI/CD Integration
Run AI pentests on every deployment:
```yaml
# .github/workflows/pentest.yml
name: AI Penetration Test
on:
push:
branches: [main]
schedule:
- cron: '0 2 * * 1' # Weekly Monday 2 AM
jobs:
pentest:
runs-on: ubuntu-latest
services:
app:
image: your-app:${{ github.sha }}
ports:
- 8080:8080
steps:
- uses: actions/checkout@v4
- name: Run Shannon Pentest
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon
./shannon start \
URL=http://localhost:8080 \
REPO=../ \
MAX_CONCURRENT=3
# Wait for completion and extract report
./shannon wait
cp workspace/report.md $GITHUB_WORKSPACE/pentest-report.md
- name: Upload Report
uses: actions/upload-artifact@v4
with:
name: pentest-report
path: pentest-report.md
- name: Fail on Critical Findings
run: |
if grep -q "CRITICAL" pentest-report.md; then
echo "::error::Critical vulnerabilities found!"
exit 1
fi
```
### Report Structure
A professional AI-generated pentest report should include: executive summary (scope, duration, methodology, overall risk, findings count by severity), individual findings (each with CVSS score, affected endpoint/parameter, evidence with reproducible curl commands, impact description, and specific remediation guidance), and a remediation priority list ordered by severity with recommended fix timelines.
## Examples
### Run an autonomous pentest on a web application
```prompt
Set up Shannon to run a full penetration test on our staging environment at https://staging.ourapp.com. The source code is in the current repository. Configure it to test for: SQL injection, XSS, SSRF, and broken authentication. Run with maximum concurrency and generate a report with reproducible proof-of-concept exploits for every finding. Flag any critical vulnerabilities that need immediate attention.
```
### Build a custom AI pentest pipeline
```prompt
Build a custom AI pentesting pipeline that combines subfinder (subdomain discovery), whatweb (tech fingerprinting), nuclei (CVE scanning), and schemathesis (API fuzzing) orchestrated by an LLM agent. The LLM should analyze results from each tool, decide what to test next, and generate exploitation payloads. Target: our API at api.example.com with the OpenAPI spec at /docs/openapi.json. Produce a structured findings report.
```
### Integrate AI pentesting into CI/CD
```prompt
Add automated penetration testing to our GitHub Actions pipeline. It should run on every push to main and weekly on a schedule. The app runs in Docker (docker-compose up), exposed at localhost:8080. Use Shannon for the pentest, upload the report as an artifact, and fail the build if any critical or high severity vulnerabilities are found. Include Slack notification for findings.
```
## Guidelines
- Only run penetration tests against systems you have explicit written authorization to test — unauthorized testing is illegal
- AI pentesters can cause real damage (data modification, service disruption) — always test against staging environments, never production
- Review AI-generated exploitation attempts before running them — LLMs can hallucinate or generate overly aggressive payloads
- Treat pentest reports as confidential — they contain vulnerability details and proof-of-concept exploits
- Set time limits and scope boundaries for autonomous testing to prevent runaway scans
- Validate AI findings manually — false positives in automated reports erode trust with stakeholders
- Store API keys and credentials used for pentesting securely — never hardcode them in CI configurationsRelated Skills
sqlmap-database-pentesting
This skill should be used when the user asks to "automate SQL injection testing," "enumerate database structure," "extract database credentials using sqlmap," "dump tables and columns...
shodan-reconnaissance-and-pentesting
This skill should be used when the user asks to "search for exposed devices on the internet," "perform Shodan reconnaissance," "find vulnerable services using Shodan," "scan IP ranges with Shodan," or "discover IoT devices and open ports." It provides comprehensive guidance for using Shodan's search engine, CLI, and API for penetration testing reconnaissance.
Daily Logs
Record the user's daily activities, progress, decisions, and learnings in a structured, chronological format.
Socratic Method: The Dialectic Engine
This skill transforms Claude into a Socratic agent — a cognitive partner who guides
Sokratische Methode: Die Dialektik-Maschine
Dieser Skill verwandelt Claude in einen sokratischen Agenten — einen kognitiven Partner, der Nutzende durch systematisches Fragen zur Wissensentdeckung führt, anstatt direkt zu instruieren.
College Football Data (CFB)
Before writing queries, consult `references/api-reference.md` for endpoints, conference IDs, team IDs, and data shapes.
College Basketball Data (CBB)
Before writing queries, consult `references/api-reference.md` for endpoints, conference IDs, team IDs, and data shapes.
Betting Analysis
Before writing queries, consult `references/api-reference.md` for odds formats, command parameters, and key concepts.
Research Proposal Generator
Generate high-quality academic research proposals for PhD applications following Nature Reviews-style academic writing conventions.
Paper Slide Deck Generator
Transform academic papers and content into professional slide deck images with automatic figure extraction.
Medical Imaging AI Literature Review Skill
Write comprehensive literature reviews following a systematic 7-phase workflow.
Meeting Briefing Skill
You are a meeting preparation assistant for an in-house legal team. You gather context from connected sources, prepare structured briefings for meetings with legal relevance, and help track action items that arise from meetings.