AI Pentesting

## Overview

25 stars

Best use case

AI Pentesting is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

## Overview

Teams using AI Pentesting should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ai-pentesting/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/TerminalSkills/skills/ai-pentesting/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/ai-pentesting/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How AI Pentesting Compares

Feature / AgentAI PentestingStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

## Overview

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# AI Pentesting

## Overview

Use AI agents to autonomously conduct penetration tests on web applications. Combine LLM reasoning with security tools (nmap, subfinder, nuclei, sqlmap, browser automation) to find and prove vulnerabilities with minimal human intervention.

## Instructions

### Methodology

AI pentesting follows the same phases as human pentesting, but the AI orchestrates each phase autonomously:

```
Phase 1: RECONNAISSANCE
├── Subdomain enumeration (subfinder)
├── Technology fingerprinting (whatweb, wappalyzer)
├── Port scanning (nmap)
├── API schema discovery (crawling, OpenAPI/GraphQL introspection)
└── Source code analysis (if white-box)
    AI decides: which tools to run, in what order, based on findings

Phase 2: VULNERABILITY ANALYSIS
├── Known CVE scanning (nuclei)
├── Web vulnerability scanning (OWASP ZAP, nikto)
├── API fuzzing (schemathesis)
├── Code-level vulnerability hunting (semgrep, CodeQL)
└── Data flow analysis (input → dangerous function)
    AI decides: which findings are likely exploitable

Phase 3: EXPLOITATION
├── SQL injection (sqlmap, manual payloads)
├── XSS (reflected, stored, DOM)
├── SSRF (internal access, cloud metadata)
├── Authentication bypass (broken auth, privilege escalation)
├── Business logic flaws (price manipulation, race conditions)
└── Browser-based exploitation (Playwright/Puppeteer)
    AI decides: exploitation order, payload selection, chaining

Phase 4: REPORTING
├── Proof-of-concept for each finding
├── Reproducible steps (curl commands, screenshots)
├── Severity rating (CVSS score)
├── Remediation guidance
└── Executive summary
    AI generates: structured, evidence-based report
```

### Setting Up Shannon

Shannon is an open-source AI pentester that automates the full lifecycle:

```bash
# Clone and set up Shannon
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon

# Configure credentials
export ANTHROPIC_API_KEY="your-api-key"
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000

# Run a pentest against your application
# Requires: Docker, target URL, source code repo
./shannon start URL=https://your-app.com REPO=./your-repo

# Monitor progress
./shannon logs

# View results in Temporal UI
open http://localhost:8233
```

Shannon's architecture:
- **Reconnaissance agent**: Maps attack surface using nmap, subfinder, whatweb
- **Vulnerability agents**: Specialized per OWASP category (injection, XSS, SSRF, auth bypass)
- **Exploitation agent**: Uses browser automation to prove vulnerabilities with real exploits
- **Reporting agent**: Generates findings with copy-paste PoC commands

### Building a Custom AI Pentest Pipeline

For cases where Shannon doesn't fit, build a custom pipeline:

```python
# ai_pentester.py
# Custom AI pentesting pipeline using LLM + security tools

import subprocess
import json
from openai import OpenAI

client = OpenAI()

class AIPentester:
    """Autonomous AI penetration tester.
    
    Orchestrates security tools using LLM reasoning
    to find and prove vulnerabilities.
    """
    
    def __init__(self, target_url: str, scope: list[str] = None):
        self.target = target_url
        self.scope = scope or [target_url]
        self.findings = []
        self.recon_data = {}
    
    async def run_pentest(self) -> dict:
        """Execute full penetration test lifecycle.
        
        Returns:
            Dict with findings, evidence, and recommendations
        """
        # Phase 1: Recon
        self.recon_data = await self._recon()
        
        # Phase 2: AI-guided vulnerability analysis
        targets = await self._analyze_attack_surface(self.recon_data)
        
        # Phase 3: AI-guided exploitation
        for target in targets:
            finding = await self._exploit(target)
            if finding:
                self.findings.append(finding)
        
        # Phase 4: Generate report
        report = await self._generate_report()
        return report
    
    async def _recon(self) -> dict:
        """Run reconnaissance tools and aggregate results."""
        recon = {}
        
        # Subdomain enumeration
        result = subprocess.run(
            ['subfinder', '-d', self._get_domain(), '-silent'],
            capture_output=True, text=True, timeout=120
        )
        recon['subdomains'] = result.stdout.strip().split('\n')
        
        # Technology fingerprinting
        result = subprocess.run(
            ['whatweb', self.target, '--log-json=/dev/stdout', '-a', '3'],
            capture_output=True, text=True, timeout=60
        )
        recon['technologies'] = json.loads(result.stdout) if result.stdout else {}
        
        # Port scanning
        result = subprocess.run(
            ['nmap', '-sV', '--top-ports', '1000', '-oJ', '-', self._get_domain()],
            capture_output=True, text=True, timeout=300
        )
        recon['ports'] = result.stdout
        
        # Nuclei scan for known CVEs
        result = subprocess.run(
            ['nuclei', '-u', self.target, '-severity', 'critical,high',
             '-json', '-silent'],
            capture_output=True, text=True, timeout=300
        )
        recon['known_vulns'] = [
            json.loads(line) for line in result.stdout.strip().split('\n')
            if line.strip()
        ]
        
        return recon
    
    async def _analyze_attack_surface(self, recon: dict) -> list:
        """Use AI to analyze recon data and prioritize attack targets."""
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content":
                 "You are an expert penetration tester. Analyze the "
                 "reconnaissance data and identify the most promising "
                 "attack vectors. Return JSON array of targets."},
                {"role": "user", "content":
                 f"Recon data:\n{json.dumps(recon, indent=2)}\n\n"
                 "Identify attack targets with: endpoint, vulnerability_type, "
                 "technique, priority (1-5), reasoning."}
            ],
            response_format={"type": "json_object"}
        )
        return json.loads(response.choices[0].message.content).get("targets", [])

    async def _exploit(self, target: dict) -> dict | None:
        """Attempt to exploit an identified vulnerability."""
        vuln_type = target.get('vulnerability_type', '').lower()
        handlers = {
            'injection': self._test_injection,
            'xss': self._test_xss,
            'ssrf': self._test_ssrf,
            'auth': self._test_auth_bypass,
        }
        for key, handler in handlers.items():
            if key in vuln_type:
                return await handler(target)
        return None

    async def _generate_report(self) -> dict:
        """Generate a structured penetration test report."""
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content":
                 "Generate a professional penetration test report with "
                 "executive summary, findings with CVSS scores, PoC steps, "
                 "and remediation recommendations."},
                {"role": "user", "content":
                 f"Target: {self.target}\n"
                 f"Findings: {json.dumps(self.findings, indent=2)}\n"
                 f"Recon data: {json.dumps(self.recon_data, indent=2)}"}
            ]
        )
        return {
            "target": self.target,
            "findings_count": len(self.findings),
            "findings": self.findings,
            "report": response.choices[0].message.content
        }
```

### CI/CD Integration

Run AI pentests on every deployment:

```yaml
# .github/workflows/pentest.yml
name: AI Penetration Test
on:
  push:
    branches: [main]
  schedule:
    - cron: '0 2 * * 1'  # Weekly Monday 2 AM

jobs:
  pentest:
    runs-on: ubuntu-latest
    services:
      app:
        image: your-app:${{ github.sha }}
        ports:
          - 8080:8080
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Run Shannon Pentest
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          git clone https://github.com/KeygraphHQ/shannon.git
          cd shannon
          ./shannon start \
            URL=http://localhost:8080 \
            REPO=../ \
            MAX_CONCURRENT=3
          
          # Wait for completion and extract report
          ./shannon wait
          cp workspace/report.md $GITHUB_WORKSPACE/pentest-report.md
      
      - name: Upload Report
        uses: actions/upload-artifact@v4
        with:
          name: pentest-report
          path: pentest-report.md
      
      - name: Fail on Critical Findings
        run: |
          if grep -q "CRITICAL" pentest-report.md; then
            echo "::error::Critical vulnerabilities found!"
            exit 1
          fi
```

### Report Structure

A professional AI-generated pentest report should include: executive summary (scope, duration, methodology, overall risk, findings count by severity), individual findings (each with CVSS score, affected endpoint/parameter, evidence with reproducible curl commands, impact description, and specific remediation guidance), and a remediation priority list ordered by severity with recommended fix timelines.

## Examples

### Run an autonomous pentest on a web application

```prompt
Set up Shannon to run a full penetration test on our staging environment at https://staging.ourapp.com. The source code is in the current repository. Configure it to test for: SQL injection, XSS, SSRF, and broken authentication. Run with maximum concurrency and generate a report with reproducible proof-of-concept exploits for every finding. Flag any critical vulnerabilities that need immediate attention.
```

### Build a custom AI pentest pipeline

```prompt
Build a custom AI pentesting pipeline that combines subfinder (subdomain discovery), whatweb (tech fingerprinting), nuclei (CVE scanning), and schemathesis (API fuzzing) orchestrated by an LLM agent. The LLM should analyze results from each tool, decide what to test next, and generate exploitation payloads. Target: our API at api.example.com with the OpenAPI spec at /docs/openapi.json. Produce a structured findings report.
```

### Integrate AI pentesting into CI/CD

```prompt
Add automated penetration testing to our GitHub Actions pipeline. It should run on every push to main and weekly on a schedule. The app runs in Docker (docker-compose up), exposed at localhost:8080. Use Shannon for the pentest, upload the report as an artifact, and fail the build if any critical or high severity vulnerabilities are found. Include Slack notification for findings.
```

## Guidelines

- Only run penetration tests against systems you have explicit written authorization to test — unauthorized testing is illegal
- AI pentesters can cause real damage (data modification, service disruption) — always test against staging environments, never production
- Review AI-generated exploitation attempts before running them — LLMs can hallucinate or generate overly aggressive payloads
- Treat pentest reports as confidential — they contain vulnerability details and proof-of-concept exploits
- Set time limits and scope boundaries for autonomous testing to prevent runaway scans
- Validate AI findings manually — false positives in automated reports erode trust with stakeholders
- Store API keys and credentials used for pentesting securely — never hardcode them in CI configurations

Related Skills

sqlmap-database-pentesting

25
from ComeOnOliver/skillshub

This skill should be used when the user asks to "automate SQL injection testing," "enumerate database structure," "extract database credentials using sqlmap," "dump tables and columns...

shodan-reconnaissance-and-pentesting

25
from ComeOnOliver/skillshub

This skill should be used when the user asks to "search for exposed devices on the internet," "perform Shodan reconnaissance," "find vulnerable services using Shodan," "scan IP ranges with Shodan," or "discover IoT devices and open ports." It provides comprehensive guidance for using Shodan's search engine, CLI, and API for penetration testing reconnaissance.

Daily Logs

25
from ComeOnOliver/skillshub

Record the user's daily activities, progress, decisions, and learnings in a structured, chronological format.

Socratic Method: The Dialectic Engine

25
from ComeOnOliver/skillshub

This skill transforms Claude into a Socratic agent — a cognitive partner who guides

Sokratische Methode: Die Dialektik-Maschine

25
from ComeOnOliver/skillshub

Dieser Skill verwandelt Claude in einen sokratischen Agenten — einen kognitiven Partner, der Nutzende durch systematisches Fragen zur Wissensentdeckung führt, anstatt direkt zu instruieren.

College Football Data (CFB)

25
from ComeOnOliver/skillshub

Before writing queries, consult `references/api-reference.md` for endpoints, conference IDs, team IDs, and data shapes.

College Basketball Data (CBB)

25
from ComeOnOliver/skillshub

Before writing queries, consult `references/api-reference.md` for endpoints, conference IDs, team IDs, and data shapes.

Betting Analysis

25
from ComeOnOliver/skillshub

Before writing queries, consult `references/api-reference.md` for odds formats, command parameters, and key concepts.

Research Proposal Generator

25
from ComeOnOliver/skillshub

Generate high-quality academic research proposals for PhD applications following Nature Reviews-style academic writing conventions.

Paper Slide Deck Generator

25
from ComeOnOliver/skillshub

Transform academic papers and content into professional slide deck images with automatic figure extraction.

Medical Imaging AI Literature Review Skill

25
from ComeOnOliver/skillshub

Write comprehensive literature reviews following a systematic 7-phase workflow.

Meeting Briefing Skill

25
from ComeOnOliver/skillshub

You are a meeting preparation assistant for an in-house legal team. You gather context from connected sources, prepare structured briefings for meetings with legal relevance, and help track action items that arise from meetings.