dataclaw
Export Claude Code, Codex, and other coding-agent conversation history to Hugging Face. Use when the user asks about exporting conversations, uploading to Hugging Face, configuring DataClaw, reviewing PII/secrets in exports, or managing their dataset.
Best use case
dataclaw is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Export Claude Code, Codex, and other coding-agent conversation history to Hugging Face. Use when the user asks about exporting conversations, uploading to Hugging Face, configuring DataClaw, reviewing PII/secrets in exports, or managing their dataset.
Teams using dataclaw should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/dataclaw/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How dataclaw Compares
| Feature / Agent | dataclaw | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Export Claude Code, Codex, and other coding-agent conversation history to Hugging Face. Use when the user asks about exporting conversations, uploading to Hugging Face, configuring DataClaw, reviewing PII/secrets in exports, or managing their dataset.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
SKILL.md Source
<!-- dataclaw-begin --> # DataClaw Skill ## THE RULE **Every `dataclaw` command outputs `next_steps`. FOLLOW THEM.** Do not memorize the flow. Do not skip steps. Do not improvise. Run the command -> read the output -> follow `next_steps`. That's it. Runtime guidance follows this checklist: 1. Install 2. Install skill 3. Prep 3A. Choose source scope 3B. Choose project scope 3C. Set redacted strings 4. Export locally 5. Review and confirm 6. Publish The CLI tracks your stage as 1-4: auth -> configure -> review -> done. `dataclaw export` (push) is **gated** - you must run `dataclaw confirm` first or it will refuse. ## Getting Started Run `dataclaw status` (or `dataclaw prep` for full details) and follow the `next_steps`. ## Output Format - `dataclaw prep`, `dataclaw config`, `dataclaw status`, and `dataclaw confirm` output pure JSON - `dataclaw export` outputs human-readable text followed by `---DATACLAW_JSON---` and a JSON block - Always parse the JSON and act on `next_steps` Key fields: - `stage` / `stage_number` / `total_stages` - where you are - `next_steps` - follow these in order - `next_command` - the single most important command to run next (null if user input needed first) ## PII Audit (Stage 5) After `dataclaw export --no-push`, follow the `next_steps` in the JSON output. The flow is: 1. **Ask the user their full name** - then grep the export for it 2. **Run the pii_commands** from the JSON output and review results with the user 3. **Ask the user what else to look for** - company names, client names, private URLs, other people's names, custom domains 4. **Deep manual scan** - sample ~20 sessions (beginning, middle, end) and look for anything sensitive the regex missed 5. **Fix and re-export** if anything found: `dataclaw config --redact "string"` then `dataclaw export --no-push` 6. **Run `dataclaw confirm` with text attestations** - pass `--full-name`, `--attest-full-name`, `--attest-sensitive`, and `--attest-manual-scan`. It runs PII scan, verifies attestations, shows project breakdown, and unlocks pushing. 7. **Push only after explicit user confirmation**: `dataclaw export --publish-attestation "User explicitly approved publishing to Hugging Face."` ## Commands Reference ```bash dataclaw status # Show current stage and next steps dataclaw prep # Discover projects, check HF auth dataclaw prep --source all # Prep with all sources explicitly selected dataclaw prep --source claude # Prep using one supported source key (for example claude) dataclaw confirm --full-name "NAME" --attest-full-name "..." --attest-sensitive "..." --attest-manual-scan "..." # Scan PII, verify attestations, unlock pushing dataclaw confirm --file /path/to/file.jsonl --full-name "NAME" --attest-full-name "..." --attest-sensitive "..." --attest-manual-scan "..." # Confirm a specific export file dataclaw list # List all projects with exclusion status dataclaw list --source all # List all sources dataclaw list --source claude # List projects for one supported source key dataclaw config # Show current config dataclaw config --repo user/my-dataset # Set HF repo dataclaw config --source all # REQUIRED source scope selection (<source|all>; for example claude or codex) dataclaw config --exclude "a,b" # Add excluded projects (appends) dataclaw config --redact "str1,str2" # Add strings to redact (appends) dataclaw config --redact-usernames "u1,u2" # Add usernames to anonymize (appends) dataclaw config --confirm-projects # Mark project selection as confirmed dataclaw export --publish-attestation "..." # Export and push (requires dataclaw confirm first) dataclaw export --no-push # Export locally only dataclaw export --source all --no-push # Export all configured sources locally dataclaw export --source claude --no-push # Export one supported source scope locally dataclaw export --all-projects # Include everything (ignore exclusions) dataclaw export --no-thinking # Exclude extended thinking blocks dataclaw export -o /path/to/file.jsonl # Custom output path dataclaw update-skill claude # Install/update the dataclaw skill for Claude Code ``` ## Gotchas - **Never run bare `hf auth login`** - it's interactive and will hang. Always use `--token`. - **`--exclude`, `--redact`, `--redact-usernames` APPEND** - they never overwrite. Safe to call repeatedly. - **Source selection is REQUIRED before export** - explicitly set `dataclaw config --source <source|all>` (for example `claude` or `codex`), or pass `--source ...` on export. - **`dataclaw prep` outputs pure JSON** - parse it directly. - **Always export with `--no-push` first** - review before publishing. - **`dataclaw export` (push) reuses the exact file reviewed by `dataclaw confirm`** - if that file is missing, re-export locally and re-confirm before pushing. - **`dataclaw export` (push) requires `dataclaw confirm` first** - it will refuse otherwise. Re-exporting with `--no-push` resets this. - **PII audit is critical** - automated redaction is not foolproof. - **Large exports take time** - 500+ sessions may take 1-3 minutes. Use a generous timeout. ## Prerequisite `command -v dataclaw >/dev/null 2>&1 && echo "dataclaw: installed" || echo "NOT INSTALLED - run: pip install dataclaw"` <!-- dataclaw-end -->
Related Skills
compose-multiplatform-patterns
KMP项目中的Compose Multiplatform和Jetpack Compose模式——状态管理、导航、主题化、性能优化和平台特定UI。
java-coding-standards
Spring Bootサービス向けのJavaコーディング標準:命名、不変性、Optional使用、ストリーム、例外、ジェネリクス、プロジェクトレイアウト。
continuous-learning
Claude Codeセッションから再利用可能なパターンを自動的に抽出し、将来の使用のために学習済みスキルとして保存します。
nextjs-best-practices
Next.js App Router principles. Server Components, data fetching, routing patterns.
network-101
Configure and test common network services (HTTP, HTTPS, SNMP, SMB) for penetration testing lab environments. Enable hands-on practice with service enumeration, log analysis, and security testing against properly configured target systems.
neon-postgres
Expert patterns for Neon serverless Postgres, branching, connection pooling, and Prisma/Drizzle integration
nanobanana-ppt-skills
AI-powered PPT generation with document analysis and styled images
multi-agent-patterns
This skill should be used when the user asks to "design multi-agent system", "implement supervisor pattern", "create swarm architecture", "coordinate multiple agents", or mentions multi-agent patterns, context isolation, agent handoffs, sub-agents, or parallel agent execution.
monorepo-management
Build efficient, scalable monorepos that enable code sharing, consistent tooling, and atomic changes across multiple packages and applications.
monetization
Estrategia e implementacao de monetizacao para produtos digitais - Stripe, subscriptions, pricing experiments, freemium, upgrade flows, churn prevention, revenue optimization e modelos de negocio SaaS.
modern-javascript-patterns
Comprehensive guide for mastering modern JavaScript (ES6+) features, functional programming patterns, and best practices for writing clean, maintainable, and performant code.
microservices-patterns
Master microservices architecture patterns including service boundaries, inter-service communication, data management, and resilience patterns for building distributed systems.