dataset-intake-auditor

在新数据集接入前检查字段、单位、缺失率、异常值与可用性。;use for data, dataset, audit workflows;do not use for 伪造统计结果, 替代正式数据治理平台.

3,891 stars

Best use case

dataset-intake-auditor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

在新数据集接入前检查字段、单位、缺失率、异常值与可用性。;use for data, dataset, audit workflows;do not use for 伪造统计结果, 替代正式数据治理平台.

Teams using dataset-intake-auditor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/dataset-intake-auditor/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/52yuanchangxing/dataset-intake-auditor/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/dataset-intake-auditor/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How dataset-intake-auditor Compares

Feature / Agentdataset-intake-auditorStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

在新数据集接入前检查字段、单位、缺失率、异常值与可用性。;use for data, dataset, audit workflows;do not use for 伪造统计结果, 替代正式数据治理平台.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# 数据集接入审计器

## 你是什么
你是“数据集接入审计器”这个独立 Skill,负责:在新数据集接入前检查字段、单位、缺失率、异常值与可用性。

## Routing
### 适合使用的情况
- 检查这个数据集能不能接入
- 给出字段和缺失率审计
- 输入通常包含:CSV/TSV 文件或目录
- 优先产出:数据集概览、字段摘要、后续动作

### 不适合使用的情况
- 不要伪造统计结果
- 不要替代正式数据治理平台
- 如果用户想直接执行外部系统写入、发送、删除、发布、变更配置,先明确边界,再只给审阅版内容或 dry-run 方案。

## 工作规则
1. 先把用户提供的信息重组成任务书,再输出结构化结果。
2. 缺信息时,优先显式列出“待确认项”,而不是直接编造。
3. 默认先给“可审阅草案”,再给“可执行清单”。
4. 遇到高风险、隐私、权限或合规问题,必须加上边界说明。
5. 如运行环境允许 shell / exec,可使用:
   - `python3 "{baseDir}/scripts/run.py" --input <输入文件> --output <输出文件>`
6. 如当前环境不能执行脚本,仍要基于 `{baseDir}/resources/template.md` 与 `{baseDir}/resources/spec.json` 的结构直接产出文本。

## 标准输出结构
请尽量按以下结构组织结果:
- 数据集概览
- 字段摘要
- 缺失与异常
- 单位与口径风险
- 接入建议
- 后续动作

## 本地资源
- 规范文件:`{baseDir}/resources/spec.json`
- 输出模板:`{baseDir}/resources/template.md`
- 示例输入输出:`{baseDir}/examples/`
- 冒烟测试:`{baseDir}/tests/smoke-test.md`

## 安全边界
- 基于本地文件做只读分析。
- 默认只读、可审计、可回滚。
- 不执行高风险命令,不隐藏依赖,不伪造事实或结果。

Related Skills

Payroll Compliance Auditor

3891
from openclaw/skills

Run a full payroll audit in under 10 minutes. Catches the errors that cost companies $845 per violation.

Payroll & HR Compliance

writing-credibility-auditor

3891
from openclaw/skills

Audit any piece of writing for missing citations, unsupported claims, logical fallacies, weasel words, and misleading statistics — then produce a structured credibility report with flagged excerpts, fallacy names, severity ratings, and suggested fixes. Use when a user asks to fact-check, audit, or review the reasoning in an article, essay, report, research summary, or argument.

Programmatic SEO Auditor Lite

3891
from openclaw/skills

Basic programmatic SEO audit — analyze page templates, crawl budget issues, and indexing health. Free version covers template analysis, crawl budget checklist, and basic content quality scoring.

MCP Security Auditor Lite

3891
from openclaw/skills

Free version — scan your MCP configuration for the top 3 security risks. Tool description injection, permission sprawl, and supply chain trust.

Ad Performance Auditor Lite

3891
from openclaw/skills

Free version — audit your ad campaigns across 3 key dimensions. Covers creative fatigue, budget allocation, and ROAS analysis.

semantic-consistency-auditor

3891
from openclaw/skills

Use semantic consistency auditor for academic writing workflows that need structured execution, explicit assumptions, and clear output boundaries.

pet-sitter-intake

3891
from openclaw/skills

Generate professional PDF client intake forms for pet sitting businesses. Use when a pet sitter, dog walker, pet boarder, or pet care professional needs a client intake form, onboarding questionnaire, or pet information sheet. Trigger phrases: "create intake form", "new client form for my pet sitting business", "pet sitter questionnaire", "boarding intake form". Supports fillable PDFs, custom color themes, multi-pet forms, home access sections, and service-specific templates.

Agent Security Auditor

3891
from openclaw/skills

Scans ERC-8004 agents for security vulnerabilities and generates comprehensive security reports.

Devvit Publishing Auditor

3891
from openclaw/skills

A specialized auditor for Reddit Devvit developers to verify app readiness before uploading to the Reddit servers. It ensures compliance with Devvit CLI v0.12.x and Reddit’s publishing standards.

hefestoai-auditor

3891
from openclaw/skills

Static code analysis tool. Detects security vulnerabilities, code smells, and complexity issues across 17 languages. All analysis runs locally — no code leaves your machine.

clauditor

3891
from openclaw/skills

Tamper-resistant audit watchdog for Clawdbot agents. Detects and logs suspicious filesystem activity with HMAC-chained evidence.

azure-storage-exposure-auditor

3891
from openclaw/skills

Identify publicly accessible Azure Storage accounts and misconfigured blob containers