cairn-ai-pentest

AI-automated penetration testing and general problem-solving system that achieved unique AK (All Killed) in Tencent Cloud Hackathon intelligent penetration challenge

22 stars

byAradotso

View on GitHub Installation ↓

Best use case

cairn-ai-pentest is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

AI-automated penetration testing and general problem-solving system that achieved unique AK (All Killed) in Tencent Cloud Hackathon intelligent penetration challenge

Teams using cairn-ai-pentest should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/cairn-ai-pentest/SKILL.md --create-dirs "https://raw.githubusercontent.com/Aradotso/trending-skills/main/skills/cairn-ai-pentest/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/cairn-ai-pentest/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How cairn-ai-pentest Compares

Feature / Agent	cairn-ai-pentest	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

AI-automated penetration testing and general problem-solving system that achieved unique AK (All Killed) in Tencent Cloud Hackathon intelligent penetration challenge

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Cairn AI Automated Penetration Testing System

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.

Cairn is an AI-driven automated penetration testing and general problem-solving framework developed by the Bytex@起零衍迹实验室 team. It achieved the unique "AK" (All Killed / full score) result in the 2nd TCH Tencent Cloud Hackathon Intelligent Penetration Challenge, placing 4th online. The system uses LLM-based agents to autonomously reason about, plan, and execute multi-step security testing tasks.

---

## What Cairn Does

- **Autonomous AI Agent Loop**: Iteratively reasons about a target, selects tools, executes commands, and interprets results
- **Penetration Testing Automation**: Web vulnerability discovery, exploitation, CTF-style challenge solving
- **General Problem Solving**: Extensible to non-security tasks via tool/plugin architecture
- **Multi-step Planning**: Breaks complex objectives into subtasks with memory and context management
- **Tool Integration**: Wraps common pentest tools (nmap, sqlmap, curl, custom scripts) as callable agent actions

---

## Project Status

> ⚠️ Code is still being organized and is expected to be open-sourced soon. The examples below reflect the architecture described in the competition writeup and visible repository structure.

Follow the writeup for architecture details: https://mp.weixin.qq.com/s/DlpEH7bVr0xi0VawPJs3XA

---

## Installation

```bash
# Clone the repository
git clone https://github.com/oritera/Cairn.git
cd Cairn

# Install Python dependencies (expected)
pip install -r requirements.txt

# Or with uv (modern Python tooling)
uv sync
```

### Environment Configuration

Create a `.env` file in the project root:

```env
# LLM Provider (OpenAI-compatible endpoint)
OPENAI_API_KEY=your_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1
MODEL_NAME=gpt-4o

# OR use a local/alternative provider
# OPENAI_BASE_URL=https://api.deepseek.com/v1
# MODEL_NAME=deepseek-chat

# Agent configuration
MAX_ITERATIONS=30
TIMEOUT_PER_STEP=60

# Target scope (safety guard)
TARGET_SCOPE=192.168.1.0/24

# Logging
LOG_LEVEL=INFO
LOG_FILE=./logs/cairn.log
```

---

## Core Architecture

Cairn follows a **ReAct (Reasoning + Acting)** agent pattern:

```
User Goal
    │
    ▼
┌─────────────────────────────┐
│         Agent Loop          │
│  ┌────────────────────────┐ │
│  │  Think (LLM Reasoning) │ │
│  └──────────┬─────────────┘ │
│             │               │
│  ┌──────────▼─────────────┐ │
│  │  Act (Tool Selection)  │ │
│  └──────────┬─────────────┘ │
│             │               │
│  ┌──────────▼─────────────┐ │
│  │  Observe (Parse Result)│ │
│  └──────────┬─────────────┘ │
│             │               │
│         (loop until done)   │
└─────────────────────────────┘
    │
    ▼
Final Answer / Exploit / Report
```

---

## Key Usage Patterns

### 1. Basic Agent Invocation (Expected CLI)

```bash
# Run against a CTF challenge or target
python cairn.py --target "http://192.168.1.100" --goal "Find and exploit SQL injection to retrieve admin credentials"

# With custom model
python cairn.py --target "http://challenge.example.com" \
  --goal "Solve this web CTF challenge and get the flag" \
  --model gpt-4o \
  --max-iterations 25

# Dry run (plan only, no execution)
python cairn.py --target "http://192.168.1.100" \
  --goal "Enumerate all open services" \
  --dry-run
```

### 2. Python API Usage (Expected)

```python
from cairn import CairnAgent
from cairn.tools import ToolRegistry
from cairn.config import CairnConfig

# Initialize configuration
config = CairnConfig(
    model_name="gpt-4o",
    api_key=os.environ["OPENAI_API_KEY"],
    base_url=os.environ.get("OPENAI_BASE_URL", "https://api.openai.com/v1"),
    max_iterations=30,
    target_scope=["192.168.1.0/24"],
)

# Build tool registry
tools = ToolRegistry()
tools.register_defaults()  # nmap, curl, sqlmap, ffuf, etc.

# Create and run agent
agent = CairnAgent(config=config, tools=tools)

result = agent.run(
    target="http://192.168.1.100",
    goal="Find all web vulnerabilities and attempt exploitation",
)

print(result.summary)
print(result.findings)
```

### 3. Custom Tool Registration

```python
from cairn.tools import Tool, ToolResult

class CustomExploitTool(Tool):
    name = "custom_exploit"
    description = "Exploits a specific vulnerability in target application"
    
    def execute(self, target: str, payload: str, **kwargs) -> ToolResult:
        import subprocess
        cmd = f"python exploit.py --target {target} --payload '{payload}'"
        output = subprocess.run(cmd, shell=True, capture_output=True, text=True)
        return ToolResult(
            success=output.returncode == 0,
            output=output.stdout,
            error=output.stderr,
        )

# Register with agent
tools.register(CustomExploitTool())
agent = CairnAgent(config=config, tools=tools)
```

### 4. Multi-Phase Penetration Test

```python
from cairn import CairnAgent, Phase
from cairn.pipeline import PentestPipeline

pipeline = PentestPipeline(agent=agent)

# Define phases
pipeline.add_phase(Phase(
    name="reconnaissance",
    goal="Enumerate all open ports and services on {target}",
))
pipeline.add_phase(Phase(
    name="vulnerability_scan",
    goal="Based on discovered services, identify exploitable vulnerabilities",
    depends_on=["reconnaissance"],
))
pipeline.add_phase(Phase(
    name="exploitation",
    goal="Exploit identified vulnerabilities and achieve {objective}",
    depends_on=["vulnerability_scan"],
))

# Run full pipeline
report = pipeline.run(
    target="192.168.1.100",
    objective="obtain root shell or read /flag",
)
report.save("./reports/pentest_report.json")
```

---

## Tool Integration Examples

### Built-in Tool Wrappers (Expected)

```python
# nmap integration
from cairn.tools.network import NmapTool

nmap = NmapTool()
result = nmap.execute(target="192.168.1.100", flags="-sV -sC -p-")
# Returns structured service enumeration data

# HTTP request tool
from cairn.tools.web import HTTPTool

http = HTTPTool()
result = http.execute(
    url="http://target.com/login",
    method="POST",
    data={"username": "admin' OR '1'='1", "password": "x"},
    follow_redirects=True,
)

# Command execution tool (sandboxed)
from cairn.tools.shell import ShellTool

shell = ShellTool(allowed_commands=["curl", "nmap", "sqlmap", "ffuf"])
result = shell.execute(command="sqlmap -u 'http://target.com/?id=1' --dbs --batch")
```

---

## Agent Memory and Context

```python
from cairn.memory import AgentMemory

# Memory persists findings across agent steps
memory = AgentMemory(
    short_term_limit=20,    # Recent observations in context
    long_term_enabled=True, # Summarize older context
    facts_store=True,       # Extract and index key facts
)

agent = CairnAgent(config=config, tools=tools, memory=memory)

# Access collected facts after run
for finding in agent.memory.findings:
    print(f"[{finding.severity}] {finding.description}")
    print(f"  Evidence: {finding.evidence}")
    print(f"  Recommendation: {finding.remediation}")
```

---

## Configuration Reference

```python
# cairn/config.py (expected structure)

@dataclass
class CairnConfig:
    # LLM settings
    model_name: str = "gpt-4o"
    api_key: str = field(default_factory=lambda: os.environ["OPENAI_API_KEY"])
    base_url: str = "https://api.openai.com/v1"
    temperature: float = 0.1       # Low temp for consistent tool use
    max_tokens: int = 4096
    
    # Agent behavior
    max_iterations: int = 30       # Hard stop on runaway loops
    timeout_per_step: int = 60     # Seconds per tool execution
    verbose: bool = False
    
    # Safety
    target_scope: list[str] = field(default_factory=list)
    dry_run: bool = False          # Plan without executing
    require_confirmation: bool = False  # Interactive approval per step
    
    # Output
    report_format: str = "json"    # json | markdown | html
    report_path: str = "./reports"
```

---

## Prompt Engineering Patterns

Cairn uses structured system prompts for reliable tool invocation:

```python
# Example system prompt structure (inferred from competition writeup)
SYSTEM_PROMPT = """You are an expert penetration tester AI agent.

## Objective
{goal}

## Target
{target}

## Available Tools
{tool_descriptions}

## Rules
1. Always reason step-by-step before acting
2. Stay within scope: {scope}
3. Prefer non-destructive enumeration before exploitation
4. Document every finding with evidence

## Response Format
Thought: <your reasoning>
Action: <tool_name>
Action Input: <tool parameters as JSON>

After receiving Observation, continue until you reach a Final Answer.
"""
```

---

## CTF / Challenge Mode

```bash
# Optimized for CTF flag capture
python cairn.py \
  --mode ctf \
  --target "http://ctf-challenge.com:8080" \
  --goal "Find the hidden flag in format FLAG{...}" \
  --model gpt-4o \
  --iterations 50 \
  --verbose

# With flag pattern matching
python cairn.py \
  --mode ctf \
  --target "http://target.com" \
  --flag-pattern "CTF\{[a-zA-Z0-9_]+\}" \
  --auto-submit
```

---

## Logging and Debugging

```python
import logging
from cairn import CairnAgent

# Enable detailed agent trace logging
logging.basicConfig(level=logging.DEBUG)

agent = CairnAgent(config=config, tools=tools, verbose=True)

# Each step is logged:
# [THINK] Analyzing login form for injection points...
# [ACT]   Calling tool: http_request
# [INPUT] {"url": "...", "method": "POST", "data": {...}}
# [OBS]   Response 200, contains "Invalid credentials"
# [THINK] Response suggests valid injection point, trying UNION...
```

---

## Troubleshooting

| Issue | Cause | Fix |
|-------|-------|-----|
| Agent loops without progress | Goal too vague or tools failing silently | Add `--max-iterations 15`, use `--verbose` to inspect loop |
| Tool execution timeout | Slow network or heavy scan | Increase `TIMEOUT_PER_STEP` in config |
| LLM refuses tool call | Safety filter on model provider | Use a less restrictive model endpoint or rephrase goal |
| Out of context window | Long agent history | Reduce `short_term_limit` or enable memory summarization |
| Scope violation error | Target not in allowed scope | Add target CIDR to `TARGET_SCOPE` in `.env` |
| Empty findings report | Agent completed but found nothing | Check target accessibility, increase iterations |

---

## Responsible Use

Cairn is licensed under **AGPL-3.0**. Usage must comply with:

- ✅ Authorized penetration tests with written permission
- ✅ CTF competitions and intentionally vulnerable lab environments
- ✅ Personal security research on systems you own
- ❌ Unauthorized access to systems you don't own
- ❌ Commercial use without a separate commercial license

Contact the maintainer at the repository for commercial licensing inquiries.

---

## Resources

- **Repository**: https://github.com/oritera/Cairn
- **Competition Writeup**: https://mp.weixin.qq.com/s/DlpEH7bVr0xi0VawPJs3XA
- **License**: AGPL-3.0
- **Team**: Bytex @ 起零衍迹实验室

Related Skills

shannon-ai-pentester

from Aradotso/trending-skills

Autonomous white-box AI pentester for web applications and APIs using source code analysis and live exploit execution

metatron-pentest-assistant

from Aradotso/trending-skills

AI-powered penetration testing assistant using local LLM (metatron-qwen via Ollama) on Parrot OS Linux

```markdown

from Aradotso/trending-skills

---

zeroboot-vm-sandbox

from Aradotso/trending-skills

Sub-millisecond VM sandboxes for AI agents using copy-on-write KVM forking via Zeroboot

yourvpndead-vpn-detection

from Aradotso/trending-skills

Android app that detects VPN/proxy servers (VLESS/xray/sing-box) via local SOCKS5 vulnerability, exposing exit IPs and server configs without root

xata-postgres-platform

from Aradotso/trending-skills

Expert skill for Xata open-source cloud-native Postgres platform with copy-on-write branching, scale-to-zero, and Kubernetes deployment

x-mentor-skill-nuwa

from Aradotso/trending-skills

AI-powered X (Twitter) content strategy skill that distills methodologies from 6 top creators + open-source algorithm data into actionable writing, growth, and monetization guidance.

wx-favorites-report

from Aradotso/trending-skills

End-to-end pipeline to extract, decrypt, and visualize WeChat Mac favorites from encrypted SQLite DB into an interactive HTML report.

wterm-web-terminal

from Aradotso/trending-skills

Web terminal emulator with Zig/WASM core, DOM rendering, and React/vanilla JS bindings

worldmonitor-intelligence-dashboard

from Aradotso/trending-skills

Real-time global intelligence dashboard with AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking

witr-process-inspector

from Aradotso/trending-skills

CLI and TUI tool that explains why processes, services, and ports are running by tracing causality chains across supervisors, containers, and shells.

wildworld-dataset

from Aradotso/trending-skills

WildWorld large-scale action-conditioned world modeling dataset with 108M+ frames from a photorealistic ARPG game, featuring per-frame annotations, 450+ actions, and explicit state information for generative world modeling research.