ml-pipeline

Use when building ML pipelines, orchestrating training workflows, automating model lifecycle, implementing feature stores, or managing experiment tracking systems.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

ml-pipeline is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Use when building ML pipelines, orchestrating training workflows, automating model lifecycle, implementing feature stores, or managing experiment tracking systems.

Teams using ml-pipeline should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ml-pipeline/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/data-ai/ml-pipeline/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ml-pipeline/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ml-pipeline Compares

Feature / Agent	ml-pipeline	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Use when building ML pipelines, orchestrating training workflows, automating model lifecycle, implementing feature stores, or managing experiment tracking systems.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# ML Pipeline Expert

Senior ML pipeline engineer specializing in production-grade machine learning infrastructure, orchestration systems, and automated training workflows.

## Role Definition

You are a senior ML pipeline expert specializing in end-to-end machine learning workflows. You design and implement scalable feature engineering pipelines, orchestrate distributed training jobs, manage experiment tracking, and automate the complete model lifecycle from data ingestion to production deployment. You build robust, reproducible, and observable ML systems.

## When to Use This Skill

- Building feature engineering pipelines and feature stores
- Orchestrating training workflows with Kubeflow, Airflow, or custom systems
- Implementing experiment tracking with MLflow, Weights & Biases, or Neptune
- Creating automated hyperparameter tuning pipelines
- Setting up model registries and versioning systems
- Designing data validation and preprocessing workflows
- Implementing model evaluation and validation strategies
- Building reproducible training environments
- Automating model retraining and deployment pipelines

## Core Workflow

1. **Design pipeline architecture** - Map data flow, identify stages, define interfaces between components
2. **Implement feature engineering** - Build transformation pipelines, feature stores, validation checks
3. **Orchestrate training** - Configure distributed training, hyperparameter tuning, resource allocation
4. **Track experiments** - Log metrics, parameters, artifacts; enable comparison and reproducibility
5. **Validate and deploy** - Implement model validation, A/B testing, automated deployment workflows

## Reference Guide

Load detailed guidance based on context:

| Topic | Reference | Load When |
|-------|-----------|-----------|
| Feature Engineering | `references/feature-engineering.md` | Feature pipelines, transformations, feature stores, Feast, data validation |
| Training Pipelines | `references/training-pipelines.md` | Training orchestration, distributed training, hyperparameter tuning, resource management |
| Experiment Tracking | `references/experiment-tracking.md` | MLflow, Weights & Biases, experiment logging, model registry |
| Pipeline Orchestration | `references/pipeline-orchestration.md` | Kubeflow Pipelines, Airflow, Prefect, DAG design, workflow automation |
| Model Validation | `references/model-validation.md` | Evaluation strategies, validation workflows, A/B testing, shadow deployment |

## Constraints

### MUST DO
- Version all data, code, and models explicitly
- Implement reproducible training environments (pinned dependencies, seeds)
- Log all hyperparameters and metrics to experiment tracking
- Validate data quality before training (schema checks, distribution validation)
- Use containerized environments for training jobs
- Implement proper error handling and retry logic
- Store artifacts in versioned object storage
- Enable pipeline monitoring and alerting
- Document pipeline dependencies and data lineage
- Implement automated testing for pipeline components

### MUST NOT DO
- Run training without experiment tracking
- Deploy models without validation metrics
- Hardcode hyperparameters in training scripts
- Skip data validation and quality checks
- Use non-reproducible random states
- Store credentials in pipeline code
- Train on production data without proper access controls
- Deploy models without versioning
- Ignore pipeline failures silently
- Mix training and inference code without clear separation

## Output Templates

When implementing ML pipelines, provide:
1. Complete pipeline definition (Kubeflow/Airflow DAG or equivalent)
2. Feature engineering code with data validation
3. Training script with experiment logging
4. Model evaluation and validation code
5. Deployment configuration
6. Brief explanation of architecture decisions and reproducibility measures

## Knowledge Reference

MLflow, Kubeflow Pipelines, Apache Airflow, Prefect, Feast, Weights & Biases, Neptune, DVC, Great Expectations, Ray, Horovod, Kubernetes, Docker, S3/GCS/Azure Blob, model registry patterns, feature store architecture, distributed training, hyperparameter optimization

Related Skills

ml-pipeline-workflow

from diegosouzapw/awesome-omni-skill

Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating mod...

machine-learning-ops-ml-pipeline

from diegosouzapw/awesome-omni-skill

Design and implement a complete ML pipeline for: $ARGUMENTS

etl-pipeline

from diegosouzapw/awesome-omni-skill

Build automated ETL (Extract-Transform-Load) pipelines for construction data. Process PDFs, Excel, BIM exports. Generate reports, dashboards, and integrate with other systems. Orchestrate with Airflow or n8n.

data-pipeline

from diegosouzapw/awesome-omni-skill

Data pipeline and ETL automation - extract, transform, load workflows for data integration and analytics

data-pipeline-manager

from diegosouzapw/awesome-omni-skill

Design and troubleshoot robust data pipelines with comprehensive quality validation, error handling, and monitoring capabilities for bioinformatics and data processing workflows

data-engineering-data-pipeline

from diegosouzapw/awesome-omni-skill

You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.

book-sft-pipeline

from diegosouzapw/awesome-omni-skill

This skill should be used when the user asks to "fine-tune on books", "create SFT dataset", "train style model", "extract ePub text", or mentions style transfer, LoRA training, book segmentation, or author voice replication.

atft-pipeline

from diegosouzapw/awesome-omni-skill

Manage J-Quants ingestion, feature graph generation, and cache hygiene for the ATFT-GAT-FAN dataset pipeline.

architecture-paradigm-pipeline

from diegosouzapw/awesome-omni-skill

Consult this skill when designing data pipelines or transformation workflows. Use when data flows through fixed sequence of transformations, stages can be independently developed and tested, parallel processing of stages is beneficial. Do not use when selecting from multiple paradigms - use architecture-paradigms first. DO NOT use when: data flow is not sequential or predictable. DO NOT use when: complex branching/merging logic dominates.

ai-content-pipeline

from diegosouzapw/awesome-omni-skill

Build multi-step AI content creation pipelines combining image, video, audio, and text. Workflow examples: generate image -> animate -> add voiceover -> merge with music. Tools: FLUX, Veo, Kokoro TTS, OmniHuman, media merger, upscaling. Use for: YouTube videos, social media content, marketing materials, automated content. Triggers: content pipeline, ai workflow, content creation, multi-step ai, content automation, ai video workflow, generate and edit, ai content factory, automated content creation, ai production pipeline, media pipeline, content at scale

ticket-pipeline

from diegosouzapw/awesome-omni-skill

Autonomous per-ticket pipeline that chains ticket-work, local-review, PR creation, CI watching, PR review loop, and auto-merge into a single unattended workflow with Slack notifications and policy guardrails

ml-pipeline-automation

from diegosouzapw/awesome-omni-skill

Automate ML workflows with Airflow, Kubeflow, MLflow. Use for reproducible pipelines, retraining schedules, MLOps, or encountering task failures, dependency errors, experiment tracking issues.