Build Your Fine-Tuning Skill

Create your llmops-fine-tuner skill from Unsloth documentation before learning fine-tuning theory

181 stars

Best use case

Build Your Fine-Tuning Skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Create your llmops-fine-tuner skill from Unsloth documentation before learning fine-tuning theory

Teams using Build Your Fine-Tuning Skill should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/64-supervised-fine-tuning/SKILL.md --create-dirs "https://raw.githubusercontent.com/majiayu000/claude-skill-registry/main/skills/data/64-supervised-fine-tuning/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/64-supervised-fine-tuning/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Build Your Fine-Tuning Skill Compares

Feature / AgentBuild Your Fine-Tuning SkillStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Create your llmops-fine-tuner skill from Unsloth documentation before learning fine-tuning theory

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Build Your Fine-Tuning Skill

You are about to learn supervised fine-tuning. But here is the pattern that separates effective AI-native developers from those who struggle: **build your skill first, then learn the technology**.

In traditional learning, you study a topic, maybe take notes, and hope you remember it later. In Skill-First learning, you create a reusable intelligence asset before you even understand the technology deeply. This asset grows with you as you learn, and by the end of the chapter, you own a production-ready skill you can sell or deploy.

This lesson follows the same pattern you used in Part 6 and Part 7. Clone a fresh skills-lab, fetch official documentation, and build your `llmops-fine-tuner` skill from authoritative sources rather than memory.

## Why Skill-First for Fine-Tuning?

Fine-tuning has dozens of hyperparameters, multiple training strategies, and evolving best practices. Trying to memorize all of this is futile. But encoding it into a skill that you can invoke, test, and improve makes this knowledge permanently accessible and actionable.

| Traditional Approach | Skill-First Approach |
|---------------------|---------------------|
| Read tutorials, forget details | Build skill, query anytime |
| Notes scattered across files | Single authoritative source |
| Knowledge decays over time | Skill improves with use |
| Cannot delegate to AI | AI executes your skill |

By the end of this chapter, your `llmops-fine-tuner` skill will guide you through:
- Dataset preparation and validation
- Model and adapter configuration
- Training loop setup and monitoring
- Checkpoint management and export

## Step 1: Clone a Fresh Skills-Lab

Start with a clean environment. This prevents state from previous experiments from affecting your work.

```bash
# Clone the skills-lab repository
git clone https://github.com/panaversity/skills-lab.git ~/skills-lab-ch64

# Navigate to the directory
cd ~/skills-lab-ch64

# Create the skill directory structure
mkdir -p .claude/skills/llmops-fine-tuner
```

**Output:**
```
Cloning into '/Users/you/skills-lab-ch64'...
```

## Step 2: Write Your LEARNING-SPEC.md

Before creating the skill, define what you are trying to accomplish. This specification guides both your learning and the skill you create.

```markdown
# LEARNING-SPEC.md

## What I Want to Learn
Supervised fine-tuning of large language models using LoRA/QLoRA with Unsloth
on Colab Free Tier (T4 GPU, 15GB VRAM).

## Why This Matters
I want to create proprietary models that encode my domain expertise,
enabling Digital FTEs that are differentiated from generic foundation models.

## Success Criteria
1. I can explain why LoRA reduces 99% of trainable parameters
2. I can configure hyperparameters (rank, alpha, learning rate) with reasoning
3. I can run a training loop on Colab T4 without OOM errors
4. I can export a fine-tuned model to GGUF for Ollama deployment
5. My skill accurately reflects official Unsloth documentation

## Constraints
- Must work on Colab Free Tier (T4, 15GB VRAM, 12GB RAM)
- Must use Unsloth for 2x speed and 60% memory reduction
- Must produce GGUF output for local deployment

## Running Example
Fine-tune a model to become a Task API Assistant (from Chapter 40)
that understands task management domain language.
```

Save this file:

```bash
# Create the learning spec
cat > LEARNING-SPEC.md << 'EOF'
# LEARNING-SPEC.md

## What I Want to Learn
Supervised fine-tuning of large language models using LoRA/QLoRA with Unsloth
on Colab Free Tier (T4 GPU, 15GB VRAM).

## Why This Matters
I want to create proprietary models that encode my domain expertise,
enabling Digital FTEs that are differentiated from generic foundation models.

## Success Criteria
1. I can explain why LoRA reduces 99% of trainable parameters
2. I can configure hyperparameters (rank, alpha, learning rate) with reasoning
3. I can run a training loop on Colab T4 without OOM errors
4. I can export a fine-tuned model to GGUF for Ollama deployment
5. My skill accurately reflects official Unsloth documentation

## Constraints
- Must work on Colab Free Tier (T4, 15GB VRAM, 12GB RAM)
- Must use Unsloth for 2x speed and 60% memory reduction
- Must produce GGUF output for local deployment

## Running Example
Fine-tune a model to become a Task API Assistant (from Chapter 40)
that understands task management domain language.
EOF
```

## Step 3: Fetch Official Documentation

The skill must be grounded in official documentation, not AI memory which may be outdated or hallucinated.

Use Claude Code or your AI assistant:

```
/fetching-library-docs unsloth fine-tuning

Fetch the official Unsloth documentation covering:
1. LoRA hyperparameters (rank, alpha, learning rate)
2. QLoRA 4-bit configuration
3. SFTTrainer setup
4. Model saving and export options
```

Key sources to reference:
- [Unsloth LoRA Hyperparameters Guide](https://docs.unsloth.ai/get-started/fine-tuning-llms-guide/lora-hyperparameters-guide)
- [Unsloth Fine-tuning LLMs Guide](https://docs.unsloth.ai/get-started/fine-tuning-llms-guide)
- [Unsloth GitHub Repository](https://github.com/unslothai/unsloth)

## Step 4: Create Your llmops-fine-tuner Skill

Based on the documentation, create your skill file:

```bash
cat > .claude/skills/llmops-fine-tuner/SKILL.md << 'EOF'
---
name: llmops-fine-tuner
description: This skill should be used when fine-tuning language models with LoRA/QLoRA using Unsloth. Use when preparing datasets, configuring training, running SFT, and exporting models.
---

# LLMOps Fine-Tuner Skill

## Purpose

Guide supervised fine-tuning of LLMs using Parameter-Efficient Fine-Tuning (PEFT)
with LoRA/QLoRA and Unsloth optimization on consumer GPUs.

## When to Use This Skill

Invoke this skill when:
- Preparing datasets for supervised fine-tuning
- Selecting base models appropriate for task and hardware
- Configuring LoRA/QLoRA hyperparameters
- Running training loops with SFTTrainer
- Monitoring training progress and detecting issues
- Exporting models (adapter, merged, GGUF)

## Prerequisite Knowledge

- Understanding of transformers architecture (attention layers, MLP)
- Familiarity with Hugging Face ecosystem (transformers, datasets, PEFT)
- Python proficiency for debugging training issues

## Hardware Context

**Colab Free Tier (T4 GPU):**
- 15GB VRAM
- 12GB RAM
- Use QLoRA (4-bit) for models up to 8B parameters
- Gradient accumulation required for effective batch sizes

## LoRA Configuration Guide

### Core Parameters

| Parameter | Recommended | Range | Purpose |
|-----------|-------------|-------|---------|
| `r` (rank) | 16 | 8-64 | Controls adapter capacity |
| `lora_alpha` | 16 (= r) | r to 2*r | Scaling factor for updates |
| `lora_dropout` | 0 | 0-0.1 | Regularization (0 for most cases) |
| `target_modules` | All attention + MLP | Varies | Which layers to adapt |

### Target Modules (Recommended)

```python
target_modules = [
    "q_proj", "k_proj", "v_proj", "o_proj",  # Attention
    "gate_proj", "up_proj", "down_proj",      # MLP
]
```

### Rank Selection Framework

| Use Case | Recommended Rank | Rationale |
|----------|-----------------|-----------|
| Simple style transfer | 8 | Few new patterns needed |
| Domain adaptation | 16 | Moderate new knowledge |
| Complex tasks | 32-64 | Significant new capabilities |
| If unsure | 16 | Safe default, adjust based on results |

## QLoRA Configuration

```python
from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",           # NormalFloat4 (optimal)
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,       # Reduces memory further
)
```

**Why NF4?** The NormalFloat4 data type is information-theoretically optimal
for normally distributed weights, which matches how LLM weights are initialized.

## Training Arguments Guide

### Starting Configuration

```python
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./outputs",
    per_device_train_batch_size=4,       # Limited by VRAM
    gradient_accumulation_steps=4,        # Effective batch = 16
    num_train_epochs=3,                   # Start small
    learning_rate=2e-4,                   # Higher for LoRA
    warmup_ratio=0.03,                    # 3% warmup
    weight_decay=0.01,                    # Light regularization
    fp16=True,                            # Mixed precision
    logging_steps=10,
    save_strategy="epoch",
    optim="adamw_8bit",                   # Memory efficient
)
```

### Learning Rate Guidelines

| Scenario | Learning Rate | Reasoning |
|----------|--------------|-----------|
| Standard SFT | 2e-4 | Default for LoRA fine-tuning |
| Continued pre-training | 5e-5 | Gentler for knowledge preservation |
| Small datasets (&lt;1K) | 1e-4 | Avoid overfitting |
| Large datasets (>10K) | 2e-4 to 5e-4 | Can be more aggressive |

## Unsloth Optimization

### Why Unsloth?

- **2x faster training** through optimized kernels
- **60% less VRAM** via memory-efficient attention
- **No accuracy loss** compared to standard training

### Setup Pattern

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-3B-Instruct",
    max_seq_length=2048,
    load_in_4bit=True,           # QLoRA mode
    dtype=None,                   # Auto-detect
)

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    lora_dropout=0,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    use_gradient_checkpointing="unsloth",
)
```

## Export Options

| Method | Size | Use Case |
|--------|------|----------|
| Adapter only | ~30MB | When base model available |
| Merged (FP16) | ~16GB | Cloud deployment with full precision |
| GGUF q4_k_m | ~4GB | Local Ollama deployment |

### GGUF Export

```python
# Save as GGUF for Ollama
model.save_pretrained_gguf(
    "model-gguf",
    tokenizer,
    quantization_method="q4_k_m",
)
```

## Troubleshooting Decision Tree

### Out of Memory (OOM)

1. Reduce `per_device_train_batch_size` (try 2, then 1)
2. Increase `gradient_accumulation_steps` proportionally
3. Reduce `max_seq_length`
4. Enable `use_gradient_checkpointing="unsloth"`
5. If still OOM: use smaller base model

### Training Loss Not Decreasing

1. Check learning rate (try 2e-4 first)
2. Verify data format matches expected template
3. Ensure tokenizer has proper chat template
4. Check for data quality issues

### Overfitting (Val loss increases while train loss decreases)

1. Reduce epochs (try 1-2)
2. Increase weight_decay
3. Add more training data
4. Reduce rank if dataset is small

## Validation Checks

Before declaring training complete:

- [ ] Training loss decreased consistently
- [ ] No OOM errors during training
- [ ] Model generates coherent responses
- [ ] Response format matches training data
- [ ] Export completed without errors
EOF
```

## Step 5: Verify Your Skill

Test that your skill was created correctly:

```bash
# Check the skill exists
ls -la .claude/skills/llmops-fine-tuner/

# View the skill content
head -50 .claude/skills/llmops-fine-tuner/SKILL.md
```

**Output:**
```
total 8
drwxr-xr-x  3 you  staff   96 Jan  1 10:00 .
drwxr-xr-x  3 you  staff   96 Jan  1 10:00 ..
-rw-r--r--  1 you  staff  4521 Jan  1 10:00 SKILL.md
```

## What Happens Next

You now have an `llmops-fine-tuner` skill that is grounded in official documentation. As you progress through this chapter:

| Lesson | How Your Skill Improves |
|--------|------------------------|
| L01: SFT Fundamentals | Add mental model explanations |
| L02: PEFT and LoRA | Refine rank/alpha selection logic |
| L03: QLoRA | Add quantization troubleshooting |
| L04: Unsloth | Add platform-specific optimizations |
| L05+: Training workflow | Add monitoring patterns |
| Capstone | Validate skill produces working model |

Each lesson will include a "Reflect on Your Skill" section where you update and improve this skill based on what you learned.

## Try With AI

Use your AI companion (Claude, ChatGPT, Gemini, or similar).

### Prompt 1: Verify Skill Structure

```
I just created my llmops-fine-tuner skill. Review the structure and tell me:
1. Does it follow the SKILL.md format correctly?
2. Is the content grounded in documentation (not hallucinated)?
3. What sections should I add as I learn more about fine-tuning?

Here is my skill:
[paste your SKILL.md content]
```

**What you are learning**: Critical evaluation of your own skill structure. Your AI partner helps identify gaps before you invest time in an incomplete skill.

### Prompt 2: Connect to Your Domain

```
I want to fine-tune a model for [your domain: legal documents, medical records,
customer support, code review, etc.]. Looking at my llmops-fine-tuner skill,
what domain-specific sections should I add? What unique challenges does my
domain present for fine-tuning?
```

**What you are learning**: Domain adaptation. Fine-tuning is not one-size-fits-all. Your AI partner helps you anticipate domain-specific challenges.

### Prompt 3: Validate Against Official Docs

```
Compare my skill's LoRA configuration recommendations against the official
Unsloth documentation. Are there any discrepancies? Any best practices
I should add?

Specifically check:
1. Recommended rank values
2. Target modules list
3. Learning rate guidance
```

**What you are learning**: Documentation verification. You are building the habit of validating AI-generated content against authoritative sources.

### Safety Note

As you create skills from documentation, remember that AI tools may not have the most current information. Always verify critical configuration values against the official source. The Unsloth documentation is updated regularly as best practices evolve.

Related Skills

admin-panel-builder

181
from majiayu000/claude-skill-registry

Expert assistant for creating and maintaining admin panel pages in the KR92 Bible Voice project. Use when creating admin pages, building admin components, integrating with admin navigation, or adding admin features.

adk-agent-builder

181
from majiayu000/claude-skill-registry

Build production-ready AI agents using Google's Agent Development Kit with AI assistant integration, React patterns, multi-agent orchestration, and comprehensive tool libraries. Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.

adb-builder

181
from majiayu000/claude-skill-registry

No description provided.

action-builder-skill

181
from majiayu000/claude-skill-registry

Use when creating or refactoring Nango integration actions to be thin API wrappers - provides patterns for minimal transformation logic, direct proxy calls, and standardized structure

acc-create-test-builder

181
from majiayu000/claude-skill-registry

Generates Test Data Builder and Object Mother patterns for PHP 8.5. Creates fluent builders with sensible defaults and factory methods for test data creation.

acc-create-builder

181
from majiayu000/claude-skill-registry

Generates Builder pattern for PHP 8.5. Creates step-by-step object construction with fluent interface and validation. Includes unit tests.

web-artifacts-builder

181
from majiayu000/claude-skill-registry

Suite of tools for creating elaborate, multi-component claude.ai HTML artifacts using modern frontend web technologies (React, Tailwind CSS, shadcn/ui). Use for complex artifacts requiring state management, routing, or shadcn/ui components - not for simple single-file HTML/JSX artifacts.

Build Your LiveKit Agents Skill

181
from majiayu000/claude-skill-registry

Create your LiveKit Agents skill from official documentation, then learn to improve it throughout the chapter

Build Your Agent Integration Skill

181
from majiayu000/claude-skill-registry

Create your agent-integration skill from OpenAI SDK and LiteLLM documentation before learning framework integration

Build Your Model Serving Skill

181
from majiayu000/claude-skill-registry

Create your model-serving skill from Ollama documentation before learning deployment theory

artifacts-builder

181
from majiayu000/claude-skill-registry

Suite of tools for creating elaborate, multi-component claude.ai HTML artifacts using modern frontend web technologies (React, Tailwind CSS, shadcn/ui). Use for complex artifacts requiring state management, routing, or shadcn/ui components - not for simple single-file HTML/JSX artifacts.

Build Your Evaluation Skill

181
from majiayu000/claude-skill-registry

Create a reusable skill for evaluating fine-tuned models, benchmarking performance, and detecting quality regressions