performing-ai-driven-osint-correlation
Use AI and LLM-based reasoning to correlate findings across multiple OSINT sources—username enumeration, email lookups, social media profiles, domain records, breach databases, and dark-web mentions—into unified intelligence profiles with confidence scoring and link analysis.
Best use case
performing-ai-driven-osint-correlation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Use AI and LLM-based reasoning to correlate findings across multiple OSINT sources—username enumeration, email lookups, social media profiles, domain records, breach databases, and dark-web mentions—into unified intelligence profiles with confidence scoring and link analysis.
Teams using performing-ai-driven-osint-correlation should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/performing-ai-driven-osint-correlation/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How performing-ai-driven-osint-correlation Compares
| Feature / Agent | performing-ai-driven-osint-correlation | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Use AI and LLM-based reasoning to correlate findings across multiple OSINT sources—username enumeration, email lookups, social media profiles, domain records, breach databases, and dark-web mentions—into unified intelligence profiles with confidence scoring and link analysis.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Performing AI-Driven OSINT Correlation
## When to Use
- You have collected raw OSINT data from multiple tools and sources but need to identify connections, contradictions, and patterns across them.
- You need to build a unified intelligence profile for a target entity (person, organization, or infrastructure) from fragmented data.
- Traditional manual correlation is too slow or error-prone for the volume of data collected.
- You want confidence-scored assessments of identity linkage across platforms rather than simple keyword matching.
## Prerequisites
- Python 3.10+ with `requests`, `json`, and `csv` libraries
- [Sherlock](https://github.com/sherlock-project/sherlock) installed (`pip install sherlock-project`)
- [theHarvester](https://github.com/laramies/theHarvester) installed (`pip install theHarvester`)
- [SpiderFoot](https://github.com/smicallef/spiderfoot) 4.0+ running on localhost:5001
- Access to an LLM API (OpenAI, Anthropic, or local model via Ollama)
- Optional: Maltego CE for graph visualization of correlation results
- Optional: API keys for Shodan, VirusTotal, HaveIBeenPwned, Hunter.io
## Workflow
### Legal & Ethical Requirements
- Obtain documented written authorization before any investigation
- Establish lawful basis for data processing (law enforcement, corporate policy, etc.)
- Define PII retention limits and data handling procedures
- Comply with local privacy regulations (GDPR, CCPA, etc.)
### Phase 1 — Multi-Source OSINT Collection
0. **Create the working directory for all OSINT outputs:**
```bash
mkdir -p /tmp/osint
```
1. **Enumerate usernames across platforms with Sherlock:**
```bash
sherlock "targetusername" --output /tmp/osint/sherlock-results.txt --csv
```
2. **Harvest emails, subdomains, and hosts with theHarvester:**
```bash
theHarvester -d targetdomain.com -b all -f /tmp/osint/harvester-results.json
```
3. **Run a SpiderFoot passive scan via REST API:**
```bash
curl -s http://localhost:5001/api/scan/start \
-d "scanname=target-recon&scantarget=targetdomain.com&usecase=passive" \
| jq '.scanid'
```
4. **Export SpiderFoot results when scan completes:**
```bash
SCAN_ID="<scanid_from_step_3>"
curl -s "http://localhost:5001/api/scan/${SCAN_ID}/results?type=all" \
-o /tmp/osint/spiderfoot-results.json
```
5. **Query breach databases for email exposure (example with HIBP API):**
```bash
curl -s -H "hibp-api-key: ${HIBP_KEY}" \
-H "User-Agent: OSINT-Correlation-Skill" \
"https://haveibeenpwned.com/api/v3/breachedaccount/target@example.com" \
-o /tmp/osint/breach-results.json
```
### Phase 2 — Data Normalization
6. **Normalize all collected data into a common schema.** Create a unified JSON structure that tags each finding with its source, timestamp, and data type:
```bash
cat > /tmp/osint/normalize.py << 'EOF'
import json, csv, sys, os
from datetime import datetime
findings = []
# Normalize Sherlock CSV results
sherlock_path = "/tmp/osint/sherlock-results.txt"
if os.path.exists(sherlock_path):
with open(sherlock_path) as f:
for row in csv.DictReader(f):
findings.append({
"source": "sherlock",
"type": "social_profile",
"platform": row.get("name", ""),
"url": row.get("url_user", ""),
"username": row.get("username", ""),
"status": row.get("status", ""),
"collected_at": datetime.utcnow().isoformat()
})
# Normalize theHarvester JSON results
harvester_path = "/tmp/osint/harvester-results.json"
if os.path.exists(harvester_path):
with open(harvester_path) as f:
data = json.load(f)
for email in data.get("emails", []):
findings.append({
"source": "theHarvester",
"type": "email",
"value": email,
"collected_at": datetime.utcnow().isoformat()
})
for host in data.get("hosts", []):
findings.append({
"source": "theHarvester",
"type": "hostname",
"value": host,
"collected_at": datetime.utcnow().isoformat()
})
# Normalize SpiderFoot results
sf_path = "/tmp/osint/spiderfoot-results.json"
if os.path.exists(sf_path):
with open(sf_path) as f:
for item in json.load(f):
findings.append({
"source": "spiderfoot",
"type": item.get("type", "unknown"),
"value": item.get("data", ""),
"module": item.get("module", ""),
"collected_at": datetime.utcnow().isoformat()
})
with open("/tmp/osint/normalized-findings.json", "w") as f:
json.dump(findings, f, indent=2)
print(f"Normalized {len(findings)} findings from {len(set(f['source'] for f in findings))} sources")
EOF
python3 /tmp/osint/normalize.py
```
### Phase 3 — AI-Driven Correlation
7. **Send normalized findings to an LLM for cross-source correlation analysis:**
```bash
cat > /tmp/osint/correlate.py << 'PYEOF'
import json, os
from openai import OpenAI # or anthropic, ollama, etc.
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
with open("/tmp/osint/normalized-findings.json") as f:
findings = json.load(f)
correlation_prompt = f"""You are an OSINT analyst. Analyze these findings collected
from multiple sources and produce a correlation report.
For each identity or entity you detect:
1. List all linked accounts/profiles with the evidence connecting them.
2. Assign a confidence score (0.0-1.0) for each linkage based on:
- Exact username match across platforms (high)
- Similar usernames with shared metadata (medium)
- Same email in breach data and registration (high)
- Co-occurring infrastructure (IP, domain) (medium)
- Temporal correlation of account creation dates (low-medium)
3. Identify contradictions or potential false positives.
4. Flag high-risk exposures (breached credentials, PII leaks, infrastructure overlaps).
5. Produce a structured JSON report.
Raw findings:
{json.dumps(findings[:500], indent=2)}
"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are an expert OSINT analyst specializing in identity correlation and link analysis."},
{"role": "user", "content": correlation_prompt}
],
temperature=0.1,
response_format={"type": "json_object"}
)
report = json.loads(response.choices[0].message.content)
with open("/tmp/osint/correlation-report.json", "w") as f:
json.dump(report, f, indent=2)
print(json.dumps(report, indent=2))
PYEOF
python3 /tmp/osint/correlate.py
```
8. **Perform entity resolution — deduplicate and merge related identities:**
```bash
cat > /tmp/osint/resolve.py << 'PYEOF'
import json
with open("/tmp/osint/correlation-report.json") as f:
report = json.load(f)
# Extract entities and build a link graph
entities = report.get("entities", [])
print(f"Identified {len(entities)} distinct entities")
for entity in entities:
name = entity.get("identifier", "unknown")
confidence = entity.get("confidence", 0)
links = entity.get("linked_accounts", [])
risk = entity.get("risk_level", "unknown")
print(f" [{confidence:.0%}] {name} — {len(links)} linked accounts — risk: {risk}")
PYEOF
python3 /tmp/osint/resolve.py
```
### Phase 4 — Reporting and Visualization
9. **Generate a final intelligence profile in Markdown:**
```bash
cat > /tmp/osint/report.py << 'PYEOF'
import json
from datetime import datetime
with open("/tmp/osint/correlation-report.json") as f:
report = json.load(f)
md = f"# OSINT Correlation Report\n\n"
md += f"**Generated:** {datetime.utcnow().isoformat()}Z\n\n"
md += "## Entity Profiles\n\n"
for entity in report.get("entities", []):
eid = entity.get("identifier", "Unknown")
conf = entity.get("confidence", 0)
md += f"### {eid} (Confidence: {conf:.0%})\n\n"
md += "| Source | Platform | Evidence |\n|--------|----------|----------|\n"
for link in entity.get("linked_accounts", []):
md += f"| {link.get('source','')} | {link.get('platform','')} | {link.get('evidence','')} |\n"
md += f"\n**Risk Level:** {entity.get('risk_level', 'N/A')}\n\n"
for flag in entity.get("flags", []):
md += f"- ⚠️ {flag}\n"
md += "\n"
with open("/tmp/osint/intelligence-profile.md", "w") as f:
f.write(md)
print("Report written to /tmp/osint/intelligence-profile.md")
PYEOF
python3 /tmp/osint/report.py
```
10. **Optional — Import correlation graph into Maltego for visualization:**
```bash
# Export entities as Maltego-compatible CSV for manual import
cat > /tmp/osint/maltego_export.py << 'PYEOF'
import json, csv
with open("/tmp/osint/correlation-report.json") as f:
report = json.load(f)
with open("/tmp/osint/maltego-import.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["Entity Type", "Value", "Linked To", "Link Label", "Confidence"])
for entity in report.get("entities", []):
for link in entity.get("linked_accounts", []):
writer.writerow([
link.get("type", "Alias"),
link.get("value", ""),
entity.get("identifier", ""),
link.get("evidence", ""),
link.get("confidence", "")
])
print("Maltego CSV exported to /tmp/osint/maltego-import.csv")
PYEOF
python3 /tmp/osint/maltego_export.py
```
## Key Concepts
| Concept | Description |
|---------|-------------|
| Cross-Source Correlation | Matching identifiers (usernames, emails, IPs) across independent OSINT sources to establish entity linkage |
| Confidence Scoring | Assigning probabilistic confidence (0.0–1.0) to each linkage based on evidence strength and corroboration |
| Entity Resolution | Deduplicating and merging records that refer to the same real-world entity across fragmented datasets |
| False Positive Detection | Using AI reasoning to identify coincidental matches versus genuine identity links |
| Multi-Vector Intelligence | Combining findings from social media, DNS, breach data, and infrastructure into a single threat picture |
| Link Analysis | Graph-based examination of relationships between entities, accounts, and infrastructure |
## Tools & Systems
| Tool | Role in Workflow |
|------|-----------------|
| Sherlock | Username enumeration across 400+ social platforms |
| theHarvester | Email, subdomain, and host discovery from public sources |
| SpiderFoot | Automated OSINT collection across 200+ modules |
| Maltego | Graph-based visualization of entity relationships |
| LLM API (GPT-4, Claude, Ollama) | Cross-source reasoning, pattern detection, and confidence scoring |
| HaveIBeenPwned | Breach exposure and credential leak detection |
## Common Scenarios
- **Threat Actor Attribution:** Correlate a suspicious username found in a phishing campaign with social media profiles, domain registrations, and breach data to build an attribution profile.
- **Attack Surface Mapping:** Link discovered subdomains, emails, and employee social accounts to understand an organization's full external exposure.
- **Insider Threat Investigation:** Cross-reference an employee's known accounts with dark web marketplace activity and breach databases.
- **Brand Impersonation Detection:** Identify accounts across platforms mimicking a target brand by correlating registration patterns, naming conventions, and temporal signals.
## Output Format
The final output is a structured JSON correlation report and a Markdown intelligence profile containing:
```json
{
"meta": {
"target": "targetdomain.com",
"sources_used": ["sherlock", "theHarvester", "spiderfoot", "hibp"],
"total_findings": 247,
"generated_at": "2025-01-15T14:30:00Z"
},
"entities": [
{
"identifier": "john.target",
"confidence": 0.92,
"linked_accounts": [
{
"source": "sherlock",
"platform": "GitHub",
"value": "john.target",
"evidence": "Exact username match, bio references targetdomain.com",
"confidence": 0.95
}
],
"risk_level": "high",
"flags": [
"Credentials exposed in 2 breaches (2022, 2023)",
"Admin email for targetdomain.com found in public WHOIS"
]
}
],
"contradictions": [],
"recommendations": []
}
```
## Verification
- Confirm that each linked account has been independently verified against at least two sources before assigning confidence > 0.8.
- Cross-check AI-generated correlations manually for a random sample (10–20%) to validate accuracy.
- Verify that no false positives from common usernames (e.g., "admin", "test") inflated entity profiles.
- Ensure breach data timestamps are current and from reputable aggregators.
- Validate that the final report does not include stale or retracted OSINT data.Related Skills
performing-yara-rule-development-for-detection
Develop precise YARA rules for malware detection by identifying unique byte patterns, strings, and behavioral indicators in executable files while minimizing false positives.
performing-wireless-security-assessment-with-kismet
Conduct wireless network security assessments using Kismet to detect rogue access points, hidden SSIDs, weak encryption, and unauthorized clients through passive RF monitoring.
performing-wireless-network-penetration-test
Execute a wireless network penetration test to assess WiFi security by capturing handshakes, cracking WPA2/WPA3 keys, detecting rogue access points, and testing wireless segmentation using Aircrack-ng and related tools.
performing-windows-artifact-analysis-with-eric-zimmerman-tools
Perform comprehensive Windows forensic artifact analysis using Eric Zimmerman's open-source EZ Tools suite including KAPE, MFTECmd, PECmd, LECmd, JLECmd, and Timeline Explorer for parsing registry hives, prefetch files, event logs, and file system metadata.
performing-wifi-password-cracking-with-aircrack
Captures WPA/WPA2 handshakes and performs offline password cracking using aircrack-ng, hashcat, and dictionary attacks during authorized wireless security assessments to evaluate passphrase strength and wireless network security posture.
performing-web-cache-poisoning-attack
Exploiting web cache mechanisms to serve malicious content to other users by poisoning cached responses through unkeyed headers and parameters during authorized security tests.
performing-web-cache-deception-attack
Execute web cache deception attacks by exploiting path normalization discrepancies between CDN caching layers and origin servers to cache and retrieve sensitive authenticated content.
performing-web-application-vulnerability-triage
Triage web application vulnerability findings from DAST/SAST scanners using OWASP risk rating methodology to separate true positives from false positives and prioritize remediation.
performing-web-application-scanning-with-nikto
Nikto is an open-source web server and web application scanner that tests against over 7,000 potentially dangerous files/programs, checks for outdated versions of over 1,250 servers, and identifies ve
performing-web-application-penetration-test
Performs systematic security testing of web applications following the OWASP Web Security Testing Guide (WSTG) methodology to identify vulnerabilities in authentication, authorization, input validation, session management, and business logic. The tester uses Burp Suite as the primary interception proxy alongside manual testing techniques to find flaws that automated scanners miss. Activates for requests involving web app pentest, OWASP testing, application security assessment, or web vulnerability testing.
performing-web-application-firewall-bypass
Bypass Web Application Firewall protections using encoding techniques, HTTP method manipulation, parameter pollution, and payload obfuscation to deliver SQL injection, XSS, and other attack payloads past WAF detection rules.
performing-vulnerability-scanning-with-nessus
Performs authenticated and unauthenticated vulnerability scanning using Tenable Nessus to identify known vulnerabilities, misconfigurations, default credentials, and missing patches across network infrastructure, servers, and applications. The scanner correlates findings with CVE databases and CVSS scores to produce prioritized remediation guidance. Activates for requests involving vulnerability scanning, Nessus assessment, patch compliance checking, or automated vulnerability detection.