data-orchestrator

Coordinates data pipeline tasks (ETL, analytics, feature engineering). Use when implementing data ingestion, transformations, quality checks, or analytics. Applies data-quality-standard.md (95% minimum).

242 stars

Best use case

data-orchestrator is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Coordinates data pipeline tasks (ETL, analytics, feature engineering). Use when implementing data ingestion, transformations, quality checks, or analytics. Applies data-quality-standard.md (95% minimum).

Coordinates data pipeline tasks (ETL, analytics, feature engineering). Use when implementing data ingestion, transformations, quality checks, or analytics. Applies data-quality-standard.md (95% minimum).

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "data-orchestrator" skill to help with this workflow task. Context: Coordinates data pipeline tasks (ETL, analytics, feature engineering). Use when implementing data ingestion, transformations, quality checks, or analytics. Applies data-quality-standard.md (95% minimum).

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

  • Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

  • Do not use this when you only need a one-off answer and do not need a reusable workflow.
  • Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/data-orchestrator/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/brownbull/data-orchestrator/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/data-orchestrator/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How data-orchestrator Compares

Feature / Agentdata-orchestratorStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Coordinates data pipeline tasks (ETL, analytics, feature engineering). Use when implementing data ingestion, transformations, quality checks, or analytics. Applies data-quality-standard.md (95% minimum).

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Data Orchestrator Skill

## Role
Acts as CTO-Data, managing all data processing, analytics, and pipeline tasks.

## Responsibilities

1. **Data Pipeline Management**
   - ETL/ELT processes
   - Data validation
   - Quality assurance
   - Pipeline monitoring

2. **Analytics Coordination**
   - Feature engineering
   - Model integration
   - Report generation
   - Metric calculation

3. **Data Governance**
   - Schema management
   - Data lineage tracking
   - Privacy compliance
   - Access control

4. **Context Maintenance**
   ```
   ai-state/active/data/
   ├── pipelines.json    # Pipeline definitions
   ├── features.json     # Feature registry
   ├── quality.json      # Data quality metrics
   └── tasks/           # Active data tasks
   ```

## Skill Coordination

### Available Data Skills
- `etl-skill` - Extract, transform, load operations
- `feature-engineering-skill` - Feature creation
- `analytics-skill` - Analysis and reporting
- `quality-skill` - Data quality checks
- `pipeline-skill` - Pipeline orchestration

### Context Package to Skills
```yaml
context:
  task_id: "task-003-pipeline"
  pipelines:
    existing: ["daily_aggregation", "customer_segmentation"]
    schedule: "0 2 * * *"
  features:
    current: ["revenue_30d", "churn_risk"]
    dependencies: ["transactions", "customers"]
  standards:
    - "data-quality-standard.md"
    - "feature-engineering.md"
  test_requirements:
    quality: ["completeness", "accuracy", "timeliness"]
```

## Task Processing Flow

1. **Receive Task**
   - Identify data sources
   - Check dependencies
   - Validate requirements

2. **Prepare Context**
   - Current pipeline state
   - Feature definitions
   - Quality metrics

3. **Assign to Skill**
   - Choose data skill
   - Set parameters
   - Define outputs

4. **Monitor Execution**
   - Track pipeline progress
   - Monitor resource usage
   - Check quality gates

5. **Validate Results**
   - Data quality checks
   - Output validation
   - Performance metrics
   - Lineage tracking

## Data-Specific Standards

### Pipeline Checklist
- [ ] Input validation
- [ ] Error handling
- [ ] Checkpoint/recovery
- [ ] Monitoring enabled
- [ ] Documentation updated
- [ ] Performance optimized

### Quality Checklist
- [ ] Completeness checks
- [ ] Accuracy validation
- [ ] Consistency rules
- [ ] Timeliness metrics
- [ ] Uniqueness constraints
- [ ] Validity ranges

### Feature Engineering Checklist
- [ ] Business logic documented
- [ ] Dependencies tracked
- [ ] Version controlled
- [ ] Performance tested
- [ ] Edge cases handled
- [ ] Monitoring added

## Integration Points

### With Backend Orchestrator
- Data model alignment
- API data contracts
- Database optimization
- Cache strategies

### With Frontend Orchestrator
- Dashboard data requirements
- Real-time vs batch
- Data freshness SLAs
- Visualization formats

### With Human-Docs
Updates documentation with:
- Pipeline changes
- Feature definitions
- Data dictionaries
- Quality reports

## Event Communication

### Listening For
```json
{
  "event": "data.source.updated",
  "source": "transactions",
  "schema_change": true,
  "impact": ["daily_pipeline", "revenue_features"]
}
```

### Broadcasting
```json
{
  "event": "data.pipeline.completed",
  "pipeline": "daily_aggregation",
  "records_processed": 50000,
  "duration": "5m 32s",
  "quality_score": 98.5
}
```

## Test Requirements

### Every Data Task Must Include
1. **Unit Tests** - Transformation logic
2. **Integration Tests** - Pipeline flow
3. **Data Quality Tests** - Accuracy, completeness
4. **Performance Tests** - Processing speed
5. **Edge Case Tests** - Null, empty, invalid data
6. **Regression Tests** - Output consistency

## Success Metrics

- Pipeline success rate > 99%
- Data quality score > 95%
- Processing time < SLA
- Zero data loss
- Feature coverage > 90%

## Common Patterns

### ETL Pattern
```python
class ETLOrchestrator:
    def run_pipeline(self, task):
        # 1. Extract from sources
        # 2. Validate input data
        # 3. Transform data
        # 4. Quality checks
        # 5. Load to destination
        # 6. Update lineage
```

### Feature Pattern
```python
class FeatureOrchestrator:
    def create_feature(self, task):
        # 1. Define feature logic
        # 2. Identify dependencies
        # 3. Implement calculation
        # 4. Add to feature store
        # 5. Create monitoring
```

## Data Processing Guidelines

### Batch Processing
- Use for large volumes
- Schedule during off-peak
- Implement checkpointing
- Monitor resource usage

### Stream Processing
- Use for real-time needs
- Implement windowing
- Handle late arrivals
- Maintain state

### Data Quality Rules
1. **Completeness** - No missing required fields
2. **Accuracy** - Values within expected ranges
3. **Consistency** - Cross-dataset alignment
4. **Timeliness** - Data freshness requirements
5. **Uniqueness** - No unwanted duplicates
6. **Validity** - Format and type correctness

## Anti-Patterns to Avoid

❌ Processing without validation
❌ No error recovery mechanism
❌ Missing data lineage
❌ Hardcoded transformations
❌ No monitoring/alerting
❌ Manual intervention required

Related Skills

master-orchestrator

242
from aiskillstore/marketplace

全自动总指挥:串联热点抓取、内容生成与爆款验证的全流程技能。

bmad-orchestrator

242
from aiskillstore/marketplace

Orchestrates BMAD workflows for structured AI-driven development. Routes work across Analysis, Planning, Solutioning, and Implementation phases.

vector-database-engineer

242
from aiskillstore/marketplace

Expert in vector databases, embedding strategies, and semantic search implementation. Masters Pinecone, Weaviate, Qdrant, Milvus, and pgvector for RAG applications, recommendation systems, and similar

tdd-orchestrator

242
from aiskillstore/marketplace

Master TDD orchestrator specializing in red-green-refactor discipline, multi-agent workflow coordination, and comprehensive test-driven development practices. Enforces TDD best practices across teams with AI-assisted testing and modern frameworks. Use PROACTIVELY for TDD implementation and governance.

sqlmap-database-pentesting

242
from aiskillstore/marketplace

This skill should be used when the user asks to "automate SQL injection testing," "enumerate database structure," "extract database credentials using sqlmap," "dump tables and columns...

sqlmap-database-penetration-testing

242
from aiskillstore/marketplace

This skill should be used when the user asks to "automate SQL injection testing," "enumerate database structure," "extract database credentials using sqlmap," "dump tables and columns from a vulnerable database," or "perform automated database penetration testing." It provides comprehensive guidance for using SQLMap to detect and exploit SQL injection vulnerabilities.

gdpr-data-handling

242
from aiskillstore/marketplace

Implement GDPR-compliant data handling with consent management, data subject rights, and privacy by design. Use when building systems that process EU personal data, implementing privacy controls, or conducting GDPR compliance reviews.

datadog-automation

242
from aiskillstore/marketplace

Automate Datadog tasks via Rube MCP (Composio): query metrics, search logs, manage monitors/dashboards, create events and downtimes. Always search tools first for current schemas.

database-optimizer

242
from aiskillstore/marketplace

Expert database optimizer specializing in modern performance tuning, query optimization, and scalable architectures. Masters advanced indexing, N+1 resolution, multi-tier caching, partitioning strategies, and cloud database optimization. Handles complex query analysis, migration strategies, and performance monitoring. Use PROACTIVELY for database optimization, performance issues, or scalability challenges.

database-migrations-sql-migrations

242
from aiskillstore/marketplace

SQL database migrations with zero-downtime strategies for PostgreSQL, MySQL, SQL Server

database-migrations-migration-observability

242
from aiskillstore/marketplace

Migration monitoring, CDC, and observability infrastructure

database-design

242
from aiskillstore/marketplace

Database design principles and decision-making. Schema design, indexing strategy, ORM selection, serverless databases.