Build Your Model Merging Skill

181 stars

Best use case

Build Your Model Merging Skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using Build Your Model Merging Skill should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/67-model-merging-optimization/SKILL.md --create-dirs "https://raw.githubusercontent.com/majiayu000/claude-skill-registry/main/skills/data/67-model-merging-optimization/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/67-model-merging-optimization/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Build Your Model Merging Skill Compares

Feature / AgentBuild Your Model Merging SkillStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

This skill provides specific capabilities for your AI agent. See the About section for full details.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Build Your Model Merging Skill

You've trained two specialized adapters: a TaskMaster persona adapter (Chapter 65) and an agentic tool-calling adapter (Chapter 66). Now you need to combine them. But before you learn the theory of model merging, you'll build a **model-merging skill** that will guide your decisions throughout this chapter and beyond.

This is the **Skill-First Learning Pattern**. Instead of learning a technology and then maybe creating a skill, you create the skill first—grounded in official documentation—and then refine it as you learn. By the end of this lesson, you'll have reusable intelligence that captures MergeKit's merging strategies, YAML configuration patterns, and RAM optimization techniques.

## Why Skill-First for Model Merging?

Model merging has a deceptively simple surface: "combine two models into one." But beneath that simplicity lies crucial decisions:

| Decision | Wrong Choice Impact |
|----------|-------------------|
| **Merge strategy** | Wrong strategy = capability interference, degraded performance |
| **Layer handling** | Incorrect ranges = corrupted model weights |
| **RAM management** | No sharding = out-of-memory crashes on consumer hardware |
| **Weight ratios** | Imbalanced mixing = one capability dominates |

A skill captures this decision framework. When you encounter a merging scenario six months from now, you won't rely on memory—you'll invoke your skill.

## Step 1: Clone Your Skills Lab Fresh

Every chapter starts clean. No assumptions about prior state.

```bash
# Navigate to your workspace
cd ~/workspace

# Clone or reset skills-lab (use your preferred approach)
git clone https://github.com/your-org/skills-lab.git ch67-skills-lab
cd ch67-skills-lab
```

**Output:**
```
Cloning into 'ch67-skills-lab'...
remote: Enumerating objects: 245, done.
remote: Counting objects: 100% (245/245), done.
Receiving objects: 100% (245/245), 89.42 KiB | 1.29 MiB/s, done.
```

Alternatively, if you're working in an existing repo:

```bash
# Create a fresh directory for Chapter 67
mkdir -p ~/workspace/ch67-model-merging
cd ~/workspace/ch67-model-merging
```

## Step 2: Write Your LEARNING-SPEC

Before fetching documentation, articulate what you want to learn and how you'll know you've succeeded. This prevents aimless reading.

Create `LEARNING-SPEC.md`:

```markdown
# LEARNING-SPEC: Model Merging

## What I Want to Learn
- How to combine multiple LoRA adapters into a single model
- Which merging strategies exist (TIES, SLERP, DARE) and when to use each
- How to handle RAM constraints on consumer hardware (12GB limit)
- MergeKit YAML configuration patterns

## Why This Matters
I have two trained adapters:
1. TaskMaster persona adapter (distinctive voice)
2. Agentic tool-calling adapter (reliable JSON output)

I need to combine them without losing either capability. Wrong strategy choice
means one capability dominates or both degrade.

## Success Criteria
1. [ ] Skill can recommend appropriate merge strategy for a given scenario
2. [ ] Skill includes YAML configuration templates for common cases
3. [ ] Skill explains RAM optimization techniques for 12GB constraint
4. [ ] Skill distinguishes when to merge vs. when to retrain combined

## Source Documents
- MergeKit GitHub repository documentation
- Arcee AI blog posts on merging techniques
- HuggingFace model merging guides
```

**Why write this first?** Without a spec, you'll read documentation passively. With a spec, you read actively—hunting for answers to YOUR questions.

## Step 3: Fetch MergeKit Documentation

Now invoke your documentation-fetching skill. In Claude Code:

```
/fetching-library-docs MergeKit model merging
```

Or manually gather from the official repository:

```bash
# MergeKit GitHub: https://github.com/arcee-ai/mergekit
# Key documentation:
# - README.md for installation and basic usage
# - docs/ for strategy explanations
# - examples/ for YAML configuration templates
```

### Key MergeKit Concepts

From the official documentation, extract these core patterns:

**Supported Merge Methods:**

| Method | Best For | How It Works |
|--------|----------|--------------|
| **linear** | Simple averaging | Weighted average of model parameters |
| **slerp** | Two similar models | Spherical linear interpolation preserving geometric properties |
| **ties** | Multiple distinct capabilities | Trim-Elect-Sign: removes redundant params, resolves sign conflicts |
| **dare_ties** | Complementary skills | Drop and rescale + TIES conflict resolution |
| **passthrough** | Layer extraction | Copy specific layers without modification |

**YAML Configuration Structure:**

```yaml
merge_method: ties
slices:
  - sources:
      - model: ./adapter_1
        layer_range: [0, 32]
      - model: ./adapter_2
        layer_range: [0, 32]
parameters:
  weight: 0.5  # per-source weight
  density: 0.5  # for TIES/DARE methods
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16
```

**RAM Optimization:**

```yaml
# Sharded merging for limited RAM
merge_method: ties
slices:
  # Process layers in batches
  - sources:
      - model: ./adapter_1
        layer_range: [0, 8]
      - model: ./adapter_2
        layer_range: [0, 8]
  - sources:
      - model: ./adapter_1
        layer_range: [8, 16]
      - model: ./adapter_2
        layer_range: [8, 16]
# ... continue for remaining layers
```

## Step 4: Create Your Model-Merging Skill

Now synthesize what you've learned into a reusable skill.

Create `.claude/skills/model-merging/SKILL.md`:

```markdown
---
name: model-merging
description: "This skill should be used when combining multiple LoRA adapters or fine-tuned models into a single unified model. Use when students have trained separate adapters for different capabilities (persona, tool-calling, domain knowledge) and need to merge them without losing functionality."
---

# Model Merging Skill

## When to Use This Skill

Invoke when you need to:
- Combine multiple LoRA adapters into one model
- Merge fine-tuned models with complementary capabilities
- Optimize merged model for RAM-constrained environments
- Decide between merging strategies (TIES, SLERP, DARE)

## Decision Framework: Merge vs. Retrain

Before merging, consider:

| Question | If Yes | If No |
|----------|--------|-------|
| Are adapters trained on overlapping data? | Retrain combined | Safe to merge |
| Do capabilities interfere (conflicting outputs)? | Retrain with multi-task | Merge is viable |
| Is one adapter significantly larger/dominant? | Consider weighted merge | Standard merge |
| Do you have compute budget for retraining? | Consider both options | Merge is only option |

## Strategy Selection Guide

### Use SLERP When:
- Merging exactly 2 models
- Models are similar (same base, similar data)
- You want smooth interpolation between behaviors

### Use TIES When:
- Merging 2+ models with distinct capabilities
- Models may have parameter conflicts
- You want to preserve strongest signals from each

### Use DARE-TIES When:
- Merging complementary skills
- Adapter parameters are mostly redundant
- You want aggressive compression (drop 90%+ parameters)

### Use Linear When:
- Simple weighted average is sufficient
- Quick baseline before trying advanced methods

## YAML Configuration Templates

### Two-Adapter Merge (TIES)

```yaml
merge_method: ties
slices:
  - sources:
      - model: ./persona_adapter
        layer_range: [0, 32]
      - model: ./agentic_adapter
        layer_range: [0, 32]
parameters:
  weight: 0.5
  density: 0.5
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16
```

### RAM-Optimized Sharded Merge

```yaml
merge_method: ties
# Process 8 layers at a time for 12GB RAM
slices:
  - sources:
      - model: ./adapter_1
        layer_range: [0, 8]
      - model: ./adapter_2
        layer_range: [0, 8]
  - sources:
      - model: ./adapter_1
        layer_range: [8, 16]
      - model: ./adapter_2
        layer_range: [8, 16]
  - sources:
      - model: ./adapter_1
        layer_range: [16, 24]
      - model: ./adapter_2
        layer_range: [16, 24]
  - sources:
      - model: ./adapter_1
        layer_range: [24, 32]
      - model: ./adapter_2
        layer_range: [24, 32]
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16
```

### Weighted Merge (Favor One Adapter)

```yaml
merge_method: ties
slices:
  - sources:
      - model: ./persona_adapter
        layer_range: [0, 32]
        parameters:
          weight: 0.7  # Favor persona
      - model: ./agentic_adapter
        layer_range: [0, 32]
        parameters:
          weight: 0.3
parameters:
  density: 0.5
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16
```

## Evaluation Checklist

After merging, verify:

- [ ] Persona trait consistency (compare to persona-only model)
- [ ] Tool-calling accuracy (compare to agentic-only model)
- [ ] No capability regression (both should work)
- [ ] RAM usage within constraints
- [ ] Inference latency acceptable

## Common Pitfalls

| Pitfall | Solution |
|---------|----------|
| OOM during merge | Use sharded layer processing |
| Capability loss | Try TIES with higher density |
| One adapter dominates | Adjust per-source weights |
| Inconsistent outputs | Evaluate base model compatibility |

## Resources

- MergeKit: https://github.com/arcee-ai/mergekit
- TIES Paper: Yadav et al., 2023
- DARE Paper: Yu et al., 2023
```

## Step 5: Verify Your Skill

Test the skill by invoking it with a real scenario.

**Test prompt:**
```
I have two LoRA adapters:
1. Persona adapter (200 examples, casual voice)
2. Tool-calling adapter (500 examples, strict JSON output)

Both trained on Llama-3-8B. I have 12GB RAM available.
Which merge strategy should I use and why?
```

**Expected skill-informed response:**

The skill should recommend:
1. **Strategy**: TIES (distinct capabilities, potential parameter conflicts)
2. **Weight balance**: Consider 0.4/0.6 favoring tool-calling (more training data)
3. **RAM handling**: Sharded merge processing 8 layers at a time
4. **Verification**: Test both persona consistency and JSON accuracy post-merge

If your skill provides this guidance, you've succeeded.

## Step 6: Update Your LEARNING-SPEC

Return to your specification and check off what you've accomplished:

```markdown
## Success Criteria
1. [x] Skill can recommend appropriate merge strategy for a given scenario
2. [x] Skill includes YAML configuration templates for common cases
3. [x] Skill explains RAM optimization techniques for 12GB constraint
4. [x] Skill distinguishes when to merge vs. when to retrain combined
```

## What You've Built

Your `model-merging` skill now captures:

| Component | Content |
|-----------|---------|
| **Decision framework** | Merge vs. retrain criteria |
| **Strategy selection** | When to use TIES/SLERP/DARE |
| **Configuration templates** | Ready-to-use YAML |
| **RAM optimization** | Sharded merging patterns |
| **Evaluation checklist** | Post-merge verification |

This skill will guide your work throughout the remaining lessons and serve you in future projects.

## Try With AI

Use your AI companion to extend and validate your skill.

### Prompt 1: Challenge Your Decision Framework

```
Review the "Merge vs. Retrain" decision framework in my model-merging skill.
I'm concerned it might be too simplistic. Ask me challenging questions:

1. What if adapters have partial data overlap (30%)?
2. What if capabilities are complementary but use conflicting base models?
3. What if I need to merge 3+ adapters, not just 2?

Help me identify gaps in my decision framework and suggest improvements.
```

**What you're learning**: Critical evaluation of your own skill—developing the meta-skill of improving reusable intelligence through adversarial questioning.

### Prompt 2: Generate Edge Case Templates

```
My model-merging skill has templates for common cases, but I'm worried about
edge cases. Help me create YAML templates for:

1. Merging adapters with different LoRA ranks (r=16 vs r=32)
2. Merging an adapter with the base model itself (not two adapters)
3. Merging when one adapter is much larger (10x parameters)

For each, explain what could go wrong and how the template addresses it.
```

**What you're learning**: Template generalization—expanding your skill to handle scenarios beyond the happy path.

### Prompt 3: Prepare for the Chapter

```
I'm about to learn model merging in depth (Lessons 1-6). Looking at my
current model-merging skill, what concepts am I missing that I'll likely
need to add? Consider:

- Mathematical foundations I haven't captured
- Failure modes I haven't documented
- Evaluation metrics beyond my checklist

Don't explain these now—just list what I should watch for as I learn,
so I can update my skill as I go.
```

**What you're learning**: Proactive learning—identifying knowledge gaps before encountering them, turning passive learning into active skill refinement.

### Safety Note

Your skill is grounded in official MergeKit documentation, but merging techniques evolve. Before using configurations from your skill in production, verify against the current MergeKit repository. Model merging can produce unexpected behaviors—always evaluate merged models thoroughly before deployment.

Related Skills

admin-panel-builder

181
from majiayu000/claude-skill-registry

Expert assistant for creating and maintaining admin panel pages in the KR92 Bible Voice project. Use when creating admin pages, building admin components, integrating with admin navigation, or adding admin features.

adk-agent-builder

181
from majiayu000/claude-skill-registry

Build production-ready AI agents using Google's Agent Development Kit with AI assistant integration, React patterns, multi-agent orchestration, and comprehensive tool libraries. Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.

adding-models

181
from majiayu000/claude-skill-registry

Guide for adding new LLM models to Letta Code. Use when the user wants to add support for a new model, needs to know valid model handles, or wants to update the model configuration. Covers models.json configuration, CI test matrix, and handle validation.

add-openrouter-model

181
from majiayu000/claude-skill-registry

Fetch OpenRouter model details and provide guidance for adding models to acai-ts provider configuration.

add-opencode-model

181
from majiayu000/claude-skill-registry

Fetch OpenCode Zen model details and provide guidance for adding models to acai-ts provider configuration.

add-odoo-model

181
from majiayu000/claude-skill-registry

Add integration for an additional Odoo Studio model to an existing Odoo PWA project. Use when user wants to add support for another model, mentions "add new model", "integrate another Odoo model", or similar.

Add Model Property

181
from majiayu000/claude-skill-registry

Add a new property to an existing data model and propagate changes through model generation to client and server. Use when adding fields to entities, extending models, or modifying data structures. Handles source model editing, regeneration, ViewModel updates, and server-side changes.

adb-builder

181
from majiayu000/claude-skill-registry

No description provided.

adapting-transfer-learning-models

181
from majiayu000/claude-skill-registry

Build this skill automates the adaptation of pre-trained machine learning models using transfer learning techniques. it is triggered when the user requests assistance with fine-tuning a model, adapting a pre-trained model to a new dataset, or performing... Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.

action-builder-skill

181
from majiayu000/claude-skill-registry

Use when creating or refactoring Nango integration actions to be thin API wrappers - provides patterns for minimal transformation logic, direct proxy calls, and standardized structure

accessibility-object-model-integration

181
from majiayu000/claude-skill-registry

Programmatic manipulation of the accessibility tree to support complex custom controls in React.

acc-create-test-builder

181
from majiayu000/claude-skill-registry

Generates Test Data Builder and Object Mother patterns for PHP 8.5. Creates fluent builders with sensible defaults and factory methods for test data creation.