Finalize Your LLMOps Skill

Consolidate all Part 8 skills into a production-ready llmops-fine-tuner skill

181 stars

Best use case

Finalize Your LLMOps Skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Consolidate all Part 8 skills into a production-ready llmops-fine-tuner skill

Teams using Finalize Your LLMOps Skill should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/72-capstone-end-to-end-llmops/SKILL.md --create-dirs "https://raw.githubusercontent.com/majiayu000/claude-skill-registry/main/skills/data/72-capstone-end-to-end-llmops/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/72-capstone-end-to-end-llmops/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Finalize Your LLMOps Skill Compares

Feature / AgentFinalize Your LLMOps SkillStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Consolidate all Part 8 skills into a production-ready llmops-fine-tuner skill

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Finalize Your LLMOps Skill

Throughout Part 8, you built specialized skills for each LLMOps capability:

| Chapter | Skill | Capability |
|---------|-------|------------|
| 61 | `llmops-decision-framework` | When to fine-tune |
| 62 | `llmops-compute-planner` | VRAM budgeting |
| 63 | `llmops-data-engineer` | Dataset creation |
| 64 | `llmops-fine-tuner` | Training workflows |
| 67 | `model-merging` | Adapter combination |
| 68 | `model-alignment` | DPO safety |
| 69 | `model-evaluation` | Quality gates |
| 70 | `model-serving` | Deployment |
| 71 | `agent-integration` | Framework connection |

Now you will consolidate these into one production-ready `llmops-fine-tuner` skill that composes all capabilities. This is Layer 3 work: transforming specialized knowledge into reusable intelligence.

## Why Consolidation Matters

**The Problem with 9 Separate Skills:**

When you need to fine-tune a model for production, you currently must:

1. Invoke `llmops-decision-framework` to determine if fine-tuning is appropriate
2. Invoke `llmops-compute-planner` to budget VRAM
3. Invoke `llmops-data-engineer` to create datasets
4. Invoke `llmops-fine-tuner` for training
5. Invoke `model-evaluation` for quality gates
6. Invoke `model-serving` for deployment
7. Invoke `agent-integration` for framework connection

That is 7 separate skill invocations with manual handoffs between each. Errors compound. Context is lost.

**The Solution: Unified Orchestration**

A consolidated skill:

- Accepts a single specification
- Orchestrates all pipeline stages internally
- Maintains context across stages
- Produces a deployable Digital FTE

This transforms your skills from **components** into a **product**.

## Skill Consolidation Architecture

### The Composition Pattern

Your consolidated skill follows this architecture:

```
llmops-fine-tuner (Unified)
├── Decision Layer
│   └── When to fine-tune? (from Ch.61)
│
├── Planning Layer
│   └── Compute requirements (from Ch.62)
│
├── Data Layer
│   └── Dataset creation (from Ch.63)
│
├── Training Layer
│   ├── Fine-tuning (from Ch.64)
│   ├── Merging (from Ch.67)
│   └── Alignment (from Ch.68)
│
├── Validation Layer
│   └── Quality gates (from Ch.69)
│
└── Deployment Layer
    ├── Serving (from Ch.70)
    └── Integration (from Ch.71)
```

### Cohesion Analysis

Not all skills should merge. Ask:

| Question | Keep Separate If... | Merge If... |
|----------|---------------------|-------------|
| **Usage frequency** | Used independently often | Always used together |
| **Domain coupling** | Different expertise needed | Same expertise |
| **Execution context** | Different environments | Same environment |
| **Failure isolation** | Need to retry independently | Fail together, succeed together |

**Analysis for LLMOps Skills:**

| Skill | Verdict | Reasoning |
|-------|---------|-----------|
| `llmops-decision-framework` | **Merge** | Always first step, gates everything |
| `llmops-compute-planner` | **Merge** | Informs all training decisions |
| `llmops-data-engineer` | **Merge** | Pipeline stage, not standalone |
| `llmops-fine-tuner` | **Core** | Central skill, others compose around it |
| `model-merging` | **Merge** | Optional but integrated |
| `model-alignment` | **Merge** | Optional but integrated |
| `model-evaluation` | **Merge** | Always required, gates deployment |
| `model-serving` | **Merge** | Pipeline terminus |
| `agent-integration` | **Merge** | Final step, produces Digital FTE |

**Result:** All skills merge into unified `llmops-fine-tuner`.

## Creating Your Consolidated Skill

### Step 1: Define the Unified SKILL.md

Create your consolidated skill:

```bash
mkdir -p .claude/skills/llmops-fine-tuner
```

**SKILL.md Structure:**

```markdown
---
name: llmops-fine-tuner
description: "End-to-end LLMOps pipeline for fine-tuning, evaluating, and deploying custom models as Digital FTEs. Use when building production model deployments."
---

# LLMOps Fine-Tuner Skill

## Purpose

Complete pipeline from specification to deployed Digital FTE.

## When to Use

- Building custom models for specific domains
- Creating proprietary AI products
- Deploying fine-tuned models to production

## Pipeline Stages

### Stage 1: Decision (from Ch.61)
Evaluate whether fine-tuning is appropriate for this use case.

[Include decision framework here]

### Stage 2: Planning (from Ch.62)
Calculate VRAM budget and hardware requirements.

[Include compute planning here]

### Stage 3: Data (from Ch.63)
Create and validate training dataset.

[Include data engineering here]

### Stage 4: Training (from Ch.64, 67, 68)
Execute fine-tuning with optional merging and alignment.

[Include training workflows here]

### Stage 5: Evaluation (from Ch.69)
Run quality gates before deployment.

[Include evaluation criteria here]

### Stage 6: Deployment (from Ch.70, 71)
Deploy model and integrate with agent frameworks.

[Include deployment and integration here]

## Specification Template

When invoking this skill, provide:

- **Domain**: What expertise is this model encoding?
- **Base Model**: Which model to fine-tune
- **Dataset Source**: Where does training data come from
- **Quality Target**: What accuracy threshold for deployment
- **Deployment Target**: Where model will run (Ollama, API, etc.)

## Constraints

- Colab Free Tier compatible (T4 GPU)
- Total cost < $1
- Completion time < 4 hours
```

### Step 2: Merge Skill Content

For each specialized skill, extract the core patterns and integrate:

**From `llmops-decision-framework`:**
```python
# Decision criteria
def should_fine_tune(use_case):
    """
    Returns True if fine-tuning is appropriate.

    Fine-tune when:
    - Prompting consistently fails (>30% error rate)
    - Domain requires specialized vocabulary
    - Latency requirements mandate smaller model
    - Cost optimization requires local deployment

    Don't fine-tune when:
    - Prompting achieves 90%+ accuracy
    - Task is one-off or experimental
    - No clear evaluation criteria exist
    """
    pass
```

**From `llmops-compute-planner`:**
```python
# VRAM budget calculation
def calculate_vram_budget(model_size_b, batch_size=1, lora_rank=16):
    """
    Estimate VRAM requirements for training.

    Formula: VRAM = model_params * 4 bytes + optimizer_state * 8 bytes + activations
    With LoRA: Reduce by ~80% (only training adapter weights)

    T4 GPU (16GB):
    - 1B params: Works with batch_size=4
    - 3B params: Works with batch_size=2
    - 7B params: Works with batch_size=1, gradient_checkpointing
    - 8B params: Works with 4-bit quantization
    """
    pass
```

**From `llmops-data-engineer`:**
```python
# Dataset creation patterns
def create_training_dataset(examples, format="chat"):
    """
    Create JSONL dataset for fine-tuning.

    Format options:
    - chat: {"messages": [{"role": "user", "content": ...}, ...]}
    - instruct: {"instruction": ..., "input": ..., "output": ...}
    - completion: {"text": "Input: ... Output: ..."}
    """
    pass
```

### Step 3: Define Handoff Protocols

Each stage produces artifacts consumed by the next:

```
Stage 1 (Decision) → decision_report.json
    └── Inputs: use_case_description
    └── Outputs: should_proceed, reasoning, constraints

Stage 2 (Planning) → compute_plan.json
    └── Inputs: decision_report, model_choice
    └── Outputs: vram_budget, batch_size, hardware_requirements

Stage 3 (Data) → dataset.jsonl + data_report.json
    └── Inputs: raw_examples, format_spec
    └── Outputs: training_data, validation_data, quality_metrics

Stage 4 (Training) → adapter_weights/ + training_report.json
    └── Inputs: base_model, dataset, compute_plan
    └── Outputs: trained_adapter, loss_curves, checkpoint_path

Stage 5 (Evaluation) → eval_report.json
    └── Inputs: trained_model, test_dataset, quality_thresholds
    └── Outputs: accuracy, pass_fail, regression_analysis

Stage 6 (Deployment) → deployed_model + integration_config.json
    └── Inputs: trained_model, deployment_target
    └── Outputs: model_endpoint, ollama_modelfile, agent_config
```

### Step 4: Validate Consolidation

Test your consolidated skill end-to-end:

```bash
# Test with minimal example
claude --skill llmops-fine-tuner "
Create a Task API model:
- Domain: Task management tool-calling
- Base: Qwen 2.5 3B Instruct
- Data: 200 synthetic examples
- Target: 90% tool-calling accuracy
- Deploy: Local Ollama
"
```

**Expected Output:**
```
Stage 1: Decision PASS - Fine-tuning appropriate for tool-calling specialization
Stage 2: Planning COMPLETE - T4 compatible, batch_size=2, LoRA rank=16
Stage 3: Data COMPLETE - 200 examples generated, 80/20 train/val split
Stage 4: Training COMPLETE - Loss converged: 0.12, Epoch 3/3
Stage 5: Evaluation PASS - Tool accuracy: 94%, threshold: 90%
Stage 6: Deployment COMPLETE - Ollama model registered: task-api-qwen3b

Digital FTE Ready: task-api-qwen3b
```

## Capability Preservation Checklist

Verify your consolidated skill preserves all original capabilities:

| Original Skill | Key Capability | Preserved? | Test |
|----------------|---------------|------------|------|
| `llmops-decision-framework` | Decision tree logic | [ ] | "Should I fine-tune for X?" |
| `llmops-compute-planner` | VRAM calculation | [ ] | "What hardware for 8B model?" |
| `llmops-data-engineer` | JSONL generation | [ ] | "Create dataset from examples" |
| `llmops-fine-tuner` | Training workflow | [ ] | "Fine-tune with these params" |
| `model-merging` | Adapter combination | [ ] | "Merge these LoRA adapters" |
| `model-alignment` | DPO training | [ ] | "Align model for safety" |
| `model-evaluation` | Quality gates | [ ] | "Evaluate against benchmarks" |
| `model-serving` | Deployment | [ ] | "Deploy to Ollama" |
| `agent-integration` | Framework connection | [ ] | "Connect to OpenAI SDK" |

## Production Readiness Criteria

Your consolidated skill is production-ready when:

1. **Single Invocation**: Full pipeline from one specification
2. **Stage Visibility**: Clear progress reporting for each stage
3. **Error Recovery**: Graceful handling with stage-specific retry
4. **Artifact Persistence**: All intermediate outputs saved for debugging
5. **Constraint Compliance**: Stays within Colab Free Tier limits
6. **Consistent Quality**: Produces equivalent results to individual skills

## What You Built

Your consolidated `llmops-fine-tuner` skill:

| Capability | Source | Integration |
|------------|--------|-------------|
| Decision logic | Ch.61 | Stage 1 gate |
| VRAM budgeting | Ch.62 | Planning layer |
| Data creation | Ch.63 | Automatic dataset prep |
| Training workflows | Ch.64, 67, 68 | Core execution |
| Quality gates | Ch.69 | Pre-deployment validation |
| Deployment | Ch.70, 71 | Final stage automation |

This is **intelligence accumulation**: 11 chapters of knowledge compressed into one reusable skill.

## Try With AI

### Prompt 1: Design Your Skill Structure

```
I'm consolidating these LLMOps skills into one unified skill:
- Decision framework (when to fine-tune)
- Compute planning (VRAM budgeting)
- Data engineering (dataset creation)
- Training (fine-tuning, merging, alignment)
- Evaluation (quality gates)
- Deployment (serving, integration)

Help me design the SKILL.md structure. What sections should it have?
How should stages hand off to each other? What specification template
should users provide when invoking this skill?
```

**What you're learning**: Skill architecture design—how to structure reusable intelligence for production use.

### Prompt 2: Test Capability Preservation

```
I've consolidated my LLMOps skills. Help me create a test plan to verify
all capabilities are preserved. For each original skill, give me:
1. A test prompt that exercises that capability
2. Expected output that proves it works
3. A failure mode to watch for
```

**What you're learning**: Validation methodology—ensuring consolidation doesn't lose specialized capabilities.

### Prompt 3: Connect to Your Domain

```
I'm building a consolidated LLMOps skill for [your domain: legal, medical,
finance, etc.]. My original skills handle [list your specialized skills].
Help me analyze: Which skills should merge? Which should stay separate?
Ask me questions about how I use each skill to determine the right
composition.
```

**What you're learning**: Cohesion analysis—understanding when to merge vs. separate based on usage patterns.

### Safety Note

When consolidating skills, preserve all safety mechanisms from individual skills. A unified skill must maintain the same guardrails as its components—never optimize away safety checks for convenience.

Related Skills

Build Your LLMOps Decision Skill

181
from majiayu000/claude-skill-registry

No description provided.

Finalize Your Dapr Skill

181
from majiayu000/claude-skill-registry

Complete your dapr-deployment skill with actor and workflow patterns, validate code generation, and document your Digital FTE component

Finalize Your Evals Skill

181
from majiayu000/claude-skill-registry

Validate your agent-evals skill by testing it on a completely different agent. A skill proves its worth when it transfers beyond the context where you learned it.

thor-skills

159
from majiayu000/claude-skill-registry

An entry point and router for AI agents to manage various THOR-related cybersecurity tasks, including running scans, analyzing logs, troubleshooting, and maintenance.

SecurityClaude

vly-money

159
from majiayu000/claude-skill-registry

Generate crypto payment links for supported tokens and networks, manage access to X402 payment-protected content, and provide direct access to the vly.money wallet interface.

Fintech & CryptoClaude

tech-blog

159
from majiayu000/claude-skill-registry

Generates comprehensive technical blog posts, offering detailed explanations of system internals, architecture, and implementation, either through source code analysis or document-driven research.

Content & DocumentationClaude

whisper-transcribe

159
from majiayu000/claude-skill-registry

Transcribes audio and video files to text using OpenAI's Whisper CLI, enhanced with contextual grounding from local markdown files for improved accuracy.

Media Processing

astro

159
from majiayu000/claude-skill-registry

This skill provides essential Astro framework patterns, focusing on server-side rendering (SSR), static site generation (SSG), middleware, and TypeScript best practices. It helps AI agents implement secure authentication, manage API routes, and debug rendering behaviors within Astro projects.

Coding & Development

lets-go-rss

159
from majiayu000/claude-skill-registry

A lightweight, full-platform RSS subscription manager that aggregates content from YouTube, Vimeo, Behance, Twitter/X, and Chinese platforms like Bilibili, Weibo, and Douyin, featuring deduplication and AI smart classification.

Content & Documentation

ux

159
from majiayu000/claude-skill-registry

This AI agent skill provides comprehensive guidance for creating professional and insightful User Experience (UX) designs, covering user research, information architecture, interaction design, visual guidance, and usability evaluation. It aims to produce actionable, user-centered solutions that avoid generic AI aesthetics.

UX Design & StrategyClaude

modal-deployment

159
from majiayu000/claude-skill-registry

Run Python code in the cloud with serverless containers, GPUs, and autoscaling using Modal. This skill enables agents to generate code for deploying ML models, running batch jobs, serving APIs, and scaling compute-intensive workloads.

DevOps & Infrastructure

ontopo

159
from majiayu000/claude-skill-registry

An AI agent skill to search for Israeli restaurants, check table availability, view menus, and retrieve booking links via the Ontopo platform, acting as an unofficial interface to its data.

General Utilities