etl-pipeline-builder

Build and manage ETL pipelines for data migration with transformation, CDC, and monitoring

509 stars

Best use case

etl-pipeline-builder is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Build and manage ETL pipelines for data migration with transformation, CDC, and monitoring

Teams using etl-pipeline-builder should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/etl-pipeline-builder/SKILL.md --create-dirs "https://raw.githubusercontent.com/a5c-ai/babysitter/main/library/specializations/code-migration-modernization/skills/etl-pipeline-builder/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/etl-pipeline-builder/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How etl-pipeline-builder Compares

Feature / Agentetl-pipeline-builderStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Build and manage ETL pipelines for data migration with transformation, CDC, and monitoring

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# ETL Pipeline Builder Skill

Builds and manages ETL (Extract, Transform, Load) pipelines for data migration, supporting incremental loads, CDC, and comprehensive monitoring.

## Purpose

Enable data pipeline creation for:
- Source-to-target mapping
- Transformation definition
- Incremental load setup
- CDC configuration
- Pipeline monitoring

## Capabilities

### 1. Source-to-Target Mapping
- Define column mappings
- Handle schema differences
- Configure data type conversions
- Manage derived columns

### 2. Transformation Definition
- Data type transformations
- Value mappings
- Aggregations
- Lookups and enrichments

### 3. Incremental Load Setup
- Define watermarks
- Configure incremental columns
- Handle deletes
- Manage merge logic

### 4. CDC Configuration
- Log-based CDC
- Trigger-based CDC
- Timestamp-based CDC
- Full load comparison

### 5. Error Handling
- Define retry policies
- Configure dead letter queues
- Handle data quality issues
- Implement alerting

### 6. Pipeline Monitoring
- Track pipeline metrics
- Monitor data volumes
- Alert on failures
- Generate SLA reports

## Tool Integrations

| Tool | Type | Integration Method |
|------|------|-------------------|
| Apache Airflow | Orchestration | Python |
| dbt | Transformation | CLI |
| Airbyte | Data integration | API |
| Fivetran | SaaS ETL | API |
| AWS DMS | Cloud migration | CLI |
| Debezium | CDC | Config |

## Output Schema

```json
{
  "pipelineId": "string",
  "timestamp": "ISO8601",
  "pipeline": {
    "name": "string",
    "source": {},
    "target": {},
    "mappings": [],
    "transformations": [],
    "schedule": "string"
  },
  "artifacts": {
    "dagFile": "string",
    "configFile": "string",
    "sqlFiles": []
  },
  "deployment": {
    "status": "string",
    "url": "string"
  }
}
```

## Integration with Migration Processes

- **database-schema-migration**: Data movement
- **cloud-migration**: Cloud data pipelines
- **data-format-migration**: Format transformation

## Related Skills

- `data-migration-validator`: Validation
- `schema-comparator`: Schema mapping

## Related Agents

- `database-migration-orchestrator`: Pipeline orchestration
- `data-architect-agent`: Pipeline design