AI Agent Skill HUB

clickhouse-github-forensics

Query GitHub event data via ClickHouse for supply chain investigations, actor profiling, and anomaly detection. Use when investigating GitHub-based attacks, tracking repository activity, analyzing actor behavior patterns, detecting tag/release tampering, or reconstructing incident timelines from public GitHub data. Triggers on GitHub supply chain attacks, repo compromise investigations, actor attribution, tag poisoning, or "query github events".

3,891 stars

Complexity: easy

View on GitHub Installation ↓

About this skill

This AI agent skill provides a powerful interface for querying a massive public dataset of over 10 billion GitHub events through a ClickHouse endpoint. It equips AI agents with the ability to perform detailed security investigations, leveraging a structured approach to access and analyze GitHub activity. The skill is designed for various use cases, including investigating GitHub supply chain compromises, tracking suspicious repository activity, analyzing actor behavior patterns, detecting tag or release tampering, and reconstructing incident timelines from public GitHub data. It was specifically developed during the Trivy supply chain compromise investigation, demonstrating its practical utility in real-world security incidents. By utilizing this skill, users can conduct rapid, data-driven forensics on public GitHub activity without the need for complex data setup. The skill leverages a performant, read-only public ClickHouse endpoint, making it an accessible and efficient tool for AI agents to gather critical intelligence from the vast stream of GitHub events.

Best use case

The primary use case is GitHub security forensics and incident response. It is invaluable for security analysts, incident responders, and researchers who need to quickly investigate suspicious activities, compromises, or supply chain attacks originating from or affecting GitHub repositories. It empowers them to leverage public event data for attribution, timeline reconstruction, and anomaly detection with the assistance of an AI agent.

Query GitHub event data via ClickHouse for supply chain investigations, actor profiling, and anomaly detection. Use when investigating GitHub-based attacks, tracking repository activity, analyzing actor behavior patterns, detecting tag/release tampering, or reconstructing incident timelines from public GitHub data. Triggers on GitHub supply chain attacks, repo compromise investigations, actor attribution, tag poisoning, or "query github events".

Users can expect actionable SQL query results and summarized insights derived from public GitHub event data, assisting in security investigations and incident response.

Practical example

Example input

Investigate the GitHub activity timeline for the actor 'octocat' for the last month to check for any suspicious repository changes.

Example output

Here is a summary of 'octocat's GitHub activity for the last month: On 2026-03-15 at 10:30 UTC, 'PushEvent' to 'octocat/repo-x' on branch 'main'. On 2026-03-18 at 14:00 UTC, 'CreateEvent' (tag 'v1.0.1') on 'octocat/repo-y'. Further details available upon request.

When to use this skill

Investigating GitHub supply chain attacks or compromises.
Tracking suspicious repository activity or changes by specific actors.
Analyzing actor behavior patterns on GitHub for profiling or attribution.
Detecting tag or release tampering on public repositories.

When not to use this skill

When investigating private GitHub repository data (this skill only uses public data).
For real-time alerting that requires instant notification (data is near real-time, ~minutes behind).
When direct write access or modification of GitHub data is required.
For tasks unrelated to GitHub event analysis or security investigations.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/clickhouse-github-forensics/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/1an0rmus/clickhouse-github-forensics/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/clickhouse-github-forensics/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How clickhouse-github-forensics Compares

Feature / Agent	clickhouse-github-forensics	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	easy	N/A

Frequently Asked Questions

What does this skill do?

Query GitHub event data via ClickHouse for supply chain investigations, actor profiling, and anomaly detection. Use when investigating GitHub-based attacks, tracking repository activity, analyzing actor behavior patterns, detecting tag/release tampering, or reconstructing incident timelines from public GitHub data. Triggers on GitHub supply chain attacks, repo compromise investigations, actor attribution, tag poisoning, or "query github events".

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

Cursor vs Codex for AI Workflows

Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.

SKILL.md Source

# ClickHouse GitHub Forensics

Query 10+ billion GitHub events for security investigations.

**Author:** Rufio @ [Permiso Security](https://permiso.io)  
**Use Case:** Built during the [Trivy supply chain compromise investigation](https://socket.dev/blog/trivy-under-attack-again-github-actions-compromise) (March 2026)

## Quick Start

```bash
curl -s "https://play.clickhouse.com/?user=play" \
  --data "SELECT ... FROM github_events WHERE ... FORMAT PrettyCompact"
```

- **Endpoint:** `https://play.clickhouse.com/?user=play`
- **Table:** `github_events`
- **Auth:** None required (public read-only)
- **Freshness:** Near real-time (~minutes behind)
- **Volume:** 10+ billion events

## Key Columns

| Column | Type | Use |
|--------|------|-----|
| `created_at` | DateTime | Event timestamp |
| `event_type` | Enum | PushEvent, CreateEvent, DeleteEvent, ReleaseEvent, etc. |
| `actor_login` | String | GitHub username |
| `repo_name` | String | `owner/repo` format |
| `ref` | String | Branch/tag name (e.g., `refs/heads/main`, `0.33.0`) |
| `ref_type` | Enum | `branch`, `tag`, `repository`, `none` |
| `action` | Enum | `published`, `created`, `opened`, `closed`, etc. |

For full schema (29 columns): see [references/schema.md](references/schema.md)

## Common Investigation Patterns

### 1. Actor Timeline (Who did what, when?)

```sql
SELECT created_at, event_type, repo_name, ref, action
FROM github_events 
WHERE actor_login = 'TARGET_ACCOUNT'
AND created_at >= '2026-03-01'
ORDER BY created_at
```

### 2. Repo Activity Window (What happened during incident?)

```sql
SELECT created_at, event_type, actor_login, ref, ref_type, action
FROM github_events 
WHERE repo_name = 'owner/repo'
AND created_at >= 'START_TIME'
AND created_at <= 'END_TIME'
ORDER BY created_at
```

### 3. Anomaly Detection (First-time repo access)

```sql
SELECT repo_name,
       countIf(created_at < 'ATTACK_DATE') as before,
       countIf(created_at >= 'ATTACK_DATE') as during
FROM github_events 
WHERE actor_login = 'SUSPECT_ACCOUNT'
AND created_at >= 'LOOKBACK_START'
GROUP BY repo_name
ORDER BY during DESC
```

### 4. Tag/Release Tampering

```sql
SELECT created_at, event_type, actor_login, ref, ref_type
FROM github_events 
WHERE repo_name = 'owner/repo'
AND event_type IN ('CreateEvent', 'DeleteEvent', 'ReleaseEvent')
AND ref_type = 'tag'
ORDER BY created_at
```

### 5. Actor Profile (Is this account legitimate?)

```sql
SELECT toStartOfMonth(created_at) as month,
       count() as events,
       uniqExact(repo_name) as unique_repos
FROM github_events 
WHERE actor_login = 'TARGET_ACCOUNT'
GROUP BY month
ORDER BY month
```

### 6. Org-Wide Activity (All repos in an org)

```sql
SELECT created_at, event_type, actor_login, repo_name, ref
FROM github_events 
WHERE repo_name LIKE 'orgname/%'
AND created_at >= 'START_TIME'
ORDER BY created_at
```

### 7. New Accounts During Incident (Potential attacker alts)

```sql
SELECT actor_login, min(created_at) as first_ever, count() as events
FROM github_events 
WHERE repo_name LIKE 'orgname/%'
GROUP BY actor_login
HAVING first_ever >= 'INCIDENT_START' AND first_ever <= 'INCIDENT_END'
ORDER BY first_ever
```

### 8. Hourly Breakdown (Attack timeline)

```sql
SELECT toStartOfHour(created_at) as hour,
       actor_login,
       count() as events,
       groupArray(distinct repo_name) as repos,
       groupArray(distinct event_type) as types
FROM github_events 
WHERE repo_name LIKE 'orgname/%'
AND created_at >= 'START_TIME'
GROUP BY hour, actor_login
ORDER BY hour
```

## Event Types Reference

| Event | Significance |
|-------|--------------|
| `PushEvent` | Code pushed to branch |
| `CreateEvent` | Branch/tag/repo created |
| `DeleteEvent` | Branch/tag deleted |
| `ReleaseEvent` | Release published/edited |
| `PullRequestEvent` | PR opened/closed/merged |
| `IssueCommentEvent` | Comment on issue |
| `ForkEvent` | Repo forked |
| `WatchEvent` | Repo starred |

## Tips

- **Output formats:** `FORMAT PrettyCompact` for tables, `FORMAT TabSeparated` for parsing
- **macOS curl:** Use `--data` not `-d` for multi-line queries
- **Timestamps:** Use UTC, format `YYYY-MM-DD HH:MM:SS`
- **No payload JSON:** Raw event payloads aren't available; use structured columns
- **Bot accounts:** Filter with `actor_login NOT IN ('github-actions[bot]', 'dependabot[bot]')`

## Security & Privacy

- Uses ClickHouse's **public playground** — all queries sent to `play.clickhouse.com`
- Data queried is GitHub's **public event stream only**
- No private repo data, credentials, or sensitive information is accessible
- Use responsibly: GitHub ToS prohibits scraping for spam or harassment

Related Skills

HIPAA Compliance for AI Agents

from openclaw/skills

Generate HIPAA compliance checklists, risk assessments, and audit frameworks for healthcare organizations deploying AI agents.

Data Governance Framework

from openclaw/skills

Assess, score, and remediate your organization's data governance posture across 6 domains.

Cybersecurity Risk Assessment

from openclaw/skills

You are a cybersecurity risk assessment specialist. When the user needs a security audit, threat assessment, or compliance review, follow this framework.

afrexai-cybersecurity-engine

from openclaw/skills

Complete cybersecurity assessment, threat modeling, and hardening system. Use when conducting security audits, threat modeling, penetration testing, incident response, or building security programs from scratch. Works with any stack — zero external dependencies.

Compliance & Audit Readiness Engine

from openclaw/skills

Your AI compliance officer. Guides startups and scale-ups through SOC 2, ISO 27001, GDPR, HIPAA, and PCI DSS — from zero to audit-ready. No consultants needed.

Compliance Audit Generator

from openclaw/skills

Run internal compliance audits against major frameworks without hiring a consultant.

AI Safety Audit

from openclaw/skills

Comprehensive AI safety and alignment audit framework for businesses deploying AI agents. Built around the UK AI Security Institute Alignment Project standards (2026), EU AI Act requirements, and NIST AI RMF.

security-guardian

from openclaw/skills

Automated security auditing for OpenClaw projects. Scans for hardcoded secrets (API keys, tokens) and container vulnerabilities (CVEs) using Trivy. Provides structured reports to help maintain a clean and secure codebase.

mema-vault

from openclaw/skills

Secure credential manager using AES-256 (Fernet) encryption. Stores, retrieves, and rotates secrets using a mandatory Master Key. Use for managing API keys, database credentials, and other sensitive tokens.

guardian-wall

from openclaw/skills

Mitigate prompt injection attacks, especially indirect ones from external web content or files. Use this skill when processing untrusted text from the internet, user-uploaded files, or any external source to sanitize content and detect malicious instructions (e.g., "ignore previous instructions", "system override").

SX-security-audit

from openclaw/skills

全方位安全审计技能。检查文件权限、环境变量、依赖漏洞、配置文件、网络端口、Git 安全、Shell 安全、macOS 安全、密钥检测等。支持 CLI 参数、JSON 输出、配置文件。当用户要求"安全检查"、"漏洞扫描"、"权限检查"、"安全审计"时使用此技能。

skill-safe-install-l0-strict

from openclaw/skills

Strict secure-install workflow for ClawHub/OpenClaw skills. Use when asked to install a skill safely, inspect skill permissions, review third-party skill risk, or run a pre-install security audit. Enforce full review + sandbox + explicit consent gates, with no author-based trust bypass.