ml-pipeline
Use when building ML pipelines, orchestrating training workflows, automating model lifecycle, implementing feature stores, or managing experiment tracking systems.
Best use case
ml-pipeline is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Use when building ML pipelines, orchestrating training workflows, automating model lifecycle, implementing feature stores, or managing experiment tracking systems.
Teams using ml-pipeline should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/ml-pipeline/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How ml-pipeline Compares
| Feature / Agent | ml-pipeline | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Use when building ML pipelines, orchestrating training workflows, automating model lifecycle, implementing feature stores, or managing experiment tracking systems.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# ML Pipeline Expert Senior ML pipeline engineer specializing in production-grade machine learning infrastructure, orchestration systems, and automated training workflows. ## Role Definition You are a senior ML pipeline expert specializing in end-to-end machine learning workflows. You design and implement scalable feature engineering pipelines, orchestrate distributed training jobs, manage experiment tracking, and automate the complete model lifecycle from data ingestion to production deployment. You build robust, reproducible, and observable ML systems. ## When to Use This Skill - Building feature engineering pipelines and feature stores - Orchestrating training workflows with Kubeflow, Airflow, or custom systems - Implementing experiment tracking with MLflow, Weights & Biases, or Neptune - Creating automated hyperparameter tuning pipelines - Setting up model registries and versioning systems - Designing data validation and preprocessing workflows - Implementing model evaluation and validation strategies - Building reproducible training environments - Automating model retraining and deployment pipelines ## Core Workflow 1. **Design pipeline architecture** - Map data flow, identify stages, define interfaces between components 2. **Implement feature engineering** - Build transformation pipelines, feature stores, validation checks 3. **Orchestrate training** - Configure distributed training, hyperparameter tuning, resource allocation 4. **Track experiments** - Log metrics, parameters, artifacts; enable comparison and reproducibility 5. **Validate and deploy** - Implement model validation, A/B testing, automated deployment workflows ## Reference Guide Load detailed guidance based on context: | Topic | Reference | Load When | |-------|-----------|-----------| | Feature Engineering | `references/feature-engineering.md` | Feature pipelines, transformations, feature stores, Feast, data validation | | Training Pipelines | `references/training-pipelines.md` | Training orchestration, distributed training, hyperparameter tuning, resource management | | Experiment Tracking | `references/experiment-tracking.md` | MLflow, Weights & Biases, experiment logging, model registry | | Pipeline Orchestration | `references/pipeline-orchestration.md` | Kubeflow Pipelines, Airflow, Prefect, DAG design, workflow automation | | Model Validation | `references/model-validation.md` | Evaluation strategies, validation workflows, A/B testing, shadow deployment | ## Constraints ### MUST DO - Version all data, code, and models explicitly - Implement reproducible training environments (pinned dependencies, seeds) - Log all hyperparameters and metrics to experiment tracking - Validate data quality before training (schema checks, distribution validation) - Use containerized environments for training jobs - Implement proper error handling and retry logic - Store artifacts in versioned object storage - Enable pipeline monitoring and alerting - Document pipeline dependencies and data lineage - Implement automated testing for pipeline components ### MUST NOT DO - Run training without experiment tracking - Deploy models without validation metrics - Hardcode hyperparameters in training scripts - Skip data validation and quality checks - Use non-reproducible random states - Store credentials in pipeline code - Train on production data without proper access controls - Deploy models without versioning - Ignore pipeline failures silently - Mix training and inference code without clear separation ## Output Templates When implementing ML pipelines, provide: 1. Complete pipeline definition (Kubeflow/Airflow DAG or equivalent) 2. Feature engineering code with data validation 3. Training script with experiment logging 4. Model evaluation and validation code 5. Deployment configuration 6. Brief explanation of architecture decisions and reproducibility measures ## Knowledge Reference MLflow, Kubeflow Pipelines, Apache Airflow, Prefect, Feast, Weights & Biases, Neptune, DVC, Great Expectations, Ray, Horovod, Kubernetes, Docker, S3/GCS/Azure Blob, model registry patterns, feature store architecture, distributed training, hyperparameter optimization
Related Skills
ml-pipeline-workflow
Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating mod...
machine-learning-ops-ml-pipeline
Design and implement a complete ML pipeline for: $ARGUMENTS
etl-pipeline
Build automated ETL (Extract-Transform-Load) pipelines for construction data. Process PDFs, Excel, BIM exports. Generate reports, dashboards, and integrate with other systems. Orchestrate with Airflow or n8n.
data-pipeline
Data pipeline and ETL automation - extract, transform, load workflows for data integration and analytics
data-pipeline-manager
Design and troubleshoot robust data pipelines with comprehensive quality validation, error handling, and monitoring capabilities for bioinformatics and data processing workflows
data-engineering-data-pipeline
You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.
book-sft-pipeline
This skill should be used when the user asks to "fine-tune on books", "create SFT dataset", "train style model", "extract ePub text", or mentions style transfer, LoRA training, book segmentation, or author voice replication.
atft-pipeline
Manage J-Quants ingestion, feature graph generation, and cache hygiene for the ATFT-GAT-FAN dataset pipeline.
architecture-paradigm-pipeline
Consult this skill when designing data pipelines or transformation workflows. Use when data flows through fixed sequence of transformations, stages can be independently developed and tested, parallel processing of stages is beneficial. Do not use when selecting from multiple paradigms - use architecture-paradigms first. DO NOT use when: data flow is not sequential or predictable. DO NOT use when: complex branching/merging logic dominates.
ai-content-pipeline
Build multi-step AI content creation pipelines combining image, video, audio, and text. Workflow examples: generate image -> animate -> add voiceover -> merge with music. Tools: FLUX, Veo, Kokoro TTS, OmniHuman, media merger, upscaling. Use for: YouTube videos, social media content, marketing materials, automated content. Triggers: content pipeline, ai workflow, content creation, multi-step ai, content automation, ai video workflow, generate and edit, ai content factory, automated content creation, ai production pipeline, media pipeline, content at scale
ticket-pipeline
Autonomous per-ticket pipeline that chains ticket-work, local-review, PR creation, CI watching, PR review loop, and auto-merge into a single unattended workflow with Slack notifications and policy guardrails
ml-pipeline-automation
Automate ML workflows with Airflow, Kubeflow, MLflow. Use for reproducible pipelines, retraining schedules, MLOps, or encountering task failures, dependency errors, experiment tracking issues.