dataclaw

Export Claude Code, Codex, and other coding-agent conversation history to Hugging Face. Use when the user asks about exporting conversations, uploading to Hugging Face, configuring DataClaw, reviewing PII/secrets in exports, or managing their dataset.

2,024 stars

bypeteromallet

View on GitHub Installation ↓

Best use case

dataclaw is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using dataclaw should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/dataclaw/SKILL.md --create-dirs "https://raw.githubusercontent.com/peteromallet/dataclaw/main/.claude/skills/dataclaw/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/dataclaw/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How dataclaw Compares

Feature / Agent	dataclaw	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

Cursor vs Codex for AI Workflows

Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# DataClaw Skill

## THE RULE

**Every `dataclaw` command outputs `next_steps`. FOLLOW THEM.**

Do not memorize the flow. Do not skip steps. Do not improvise.
Run the command -> read the output -> follow `next_steps`. That's it.

Runtime guidance follows this checklist:

1. Install
2. Install skill
3. Prep
3A. Choose source scope
3B. Choose project scope
3C. Set redacted strings
4. Export locally
5. Review and confirm
6. Publish

The CLI tracks your stage as 1-4: auth -> configure -> review -> done.
`dataclaw export` (push) is **gated** - you must run `dataclaw confirm` first or it will refuse.

## Getting Started

Run `dataclaw status` (or `dataclaw prep` for full details) and follow the `next_steps`.

## Output Format

- `dataclaw prep`, `dataclaw config`, `dataclaw status`, and `dataclaw confirm` output pure JSON
- `dataclaw export` outputs human-readable text followed by `---DATACLAW_JSON---` and a JSON block
- Always parse the JSON and act on `next_steps`

Key fields:
- `stage` / `stage_number` / `total_stages` - where you are
- `next_steps` - follow these in order
- `next_command` - the single most important command to run next (null if user input needed first)

## PII Audit (Stage 5)

After `dataclaw export --no-push`, follow the `next_steps` in the JSON output. The flow is:

1. **Ask the user their full name** - then grep the export for it
2. **Run the pii_commands** from the JSON output and review results with the user
3. **Ask the user what else to look for** - company names, client names, private URLs, other people's names, custom domains
4. **Deep manual scan** - sample ~20 sessions (beginning, middle, end) and look for anything sensitive the regex missed
5. **Fix and re-export** if anything found: `dataclaw config --redact "string"` then `dataclaw export --no-push`
6. **Run `dataclaw confirm` with text attestations** - pass `--full-name`, `--attest-full-name`, `--attest-sensitive`, and `--attest-manual-scan`. It runs PII scan, verifies attestations, shows project breakdown, and unlocks pushing.
7. **Push only after explicit user confirmation**: `dataclaw export --publish-attestation "User explicitly approved publishing to Hugging Face."`

## Commands Reference

```bash
dataclaw status # Show current stage and next steps
dataclaw prep # Discover projects, check HF auth
dataclaw prep --source all # Prep with all sources explicitly selected
dataclaw prep --source claude # Prep using one supported source key (for example claude)
dataclaw confirm --full-name "NAME" --attest-full-name "..." --attest-sensitive "..." --attest-manual-scan "..." # Scan PII, verify attestations, unlock pushing
dataclaw confirm --file /path/to/file.jsonl --full-name "NAME" --attest-full-name "..." --attest-sensitive "..." --attest-manual-scan "..." # Confirm a specific export file
dataclaw list # List all projects with exclusion status
dataclaw list --source all # List all sources
dataclaw list --source claude # List projects for one supported source key
dataclaw config # Show current config
dataclaw config --repo user/my-dataset # Set HF repo
dataclaw config --source all # REQUIRED source scope selection (<source|all>; for example claude or codex)
dataclaw config --exclude "a,b" # Add excluded projects (appends)
dataclaw config --redact "str1,str2" # Add strings to redact (appends)
dataclaw config --redact-usernames "u1,u2" # Add usernames to anonymize (appends)
dataclaw config --confirm-projects # Mark project selection as confirmed
dataclaw export --publish-attestation "..." # Export and push (requires dataclaw confirm first)
dataclaw export --no-push # Export locally only
dataclaw export --source all --no-push # Export all configured sources locally
dataclaw export --source claude --no-push # Export one supported source scope locally
dataclaw export --all-projects # Include everything (ignore exclusions)
dataclaw export --no-thinking # Exclude extended thinking blocks
dataclaw export -o /path/to/file.jsonl # Custom output path
dataclaw update-skill claude # Install/update the dataclaw skill for Claude Code
```

## Gotchas

- **Never run bare `hf auth login`** - it's interactive and will hang. Always use `--token`.
- **`--exclude`, `--redact`, `--redact-usernames` APPEND** - they never overwrite. Safe to call repeatedly.
- **Source selection is REQUIRED before export** - explicitly set `dataclaw config --source <source|all>` (for example `claude` or `codex`), or pass `--source ...` on export.
- **`dataclaw prep` outputs pure JSON** - parse it directly.
- **Always export with `--no-push` first** - review before publishing.
- **`dataclaw export` (push) reuses the exact file reviewed by `dataclaw confirm`** - if that file is missing, re-export locally and re-confirm before pushing.
- **`dataclaw export` (push) requires `dataclaw confirm` first** - it will refuse otherwise. Re-exporting with `--no-push` resets this.
- **PII audit is critical** - automated redaction is not foolproof.
- **Large exports take time** - 500+ sessions may take 1-3 minutes. Use a generous timeout.

## Prerequisite

`command -v dataclaw >/dev/null 2>&1 && echo "dataclaw: installed" || echo "NOT INSTALLED - run: pip install dataclaw"`

Related Skills

compose-multiplatform-patterns

144923

from affaan-m/everything-claude-code

KMP项目中的Compose Multiplatform和Jetpack Compose模式——状态管理、导航、主题化、性能优化和平台特定UI。

java-coding-standards

144923

from affaan-m/everything-claude-code

Spring Bootサービス向けのJavaコーディング標準：命名、不変性、Optional使用、ストリーム、例外、ジェネリクス、プロジェクトレイアウト。

continuous-learning

144923

from affaan-m/everything-claude-code

Claude Codeセッションから再利用可能なパターンを自動的に抽出し、将来の使用のために学習済みスキルとして保存します。

nextjs-best-practices

31392

from sickn33/antigravity-awesome-skills

Next.js App Router principles. Server Components, data fetching, routing patterns.

network-101

31392

from sickn33/antigravity-awesome-skills

Configure and test common network services (HTTP, HTTPS, SNMP, SMB) for penetration testing lab environments. Enable hands-on practice with service enumeration, log analysis, and security testing against properly configured target systems.

neon-postgres

31392

from sickn33/antigravity-awesome-skills

Expert patterns for Neon serverless Postgres, branching, connection pooling, and Prisma/Drizzle integration

nanobanana-ppt-skills

31392

from sickn33/antigravity-awesome-skills

AI-powered PPT generation with document analysis and styled images

multi-agent-patterns

31392

from sickn33/antigravity-awesome-skills

This skill should be used when the user asks to "design multi-agent system", "implement supervisor pattern", "create swarm architecture", "coordinate multiple agents", or mentions multi-agent patterns, context isolation, agent handoffs, sub-agents, or parallel agent execution.

monorepo-management

31392

from sickn33/antigravity-awesome-skills

Build efficient, scalable monorepos that enable code sharing, consistent tooling, and atomic changes across multiple packages and applications.

monetization

31392

from sickn33/antigravity-awesome-skills

Estrategia e implementacao de monetizacao para produtos digitais - Stripe, subscriptions, pricing experiments, freemium, upgrade flows, churn prevention, revenue optimization e modelos de negocio SaaS.

modern-javascript-patterns

31392

from sickn33/antigravity-awesome-skills

Comprehensive guide for mastering modern JavaScript (ES6+) features, functional programming patterns, and best practices for writing clean, maintainable, and performant code.

microservices-patterns

31392

from sickn33/antigravity-awesome-skills

Master microservices architecture patterns including service boundaries, inter-service communication, data management, and resilience patterns for building distributed systems.