Best use case
Build Your Model Merging Skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Teams using Build Your Model Merging Skill should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/build-your-model-merging-skill/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How Build Your Model Merging Skill Compares
| Feature / Agent | Build Your Model Merging Skill | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
This skill provides specific capabilities for your AI agent. See the About section for full details.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Build Your Model Merging Skill
You've trained two specialized adapters: a TaskMaster persona adapter (Chapter 65) and an agentic tool-calling adapter (Chapter 66). Now you need to combine them. But before you learn the theory of model merging, you'll build a **model-merging skill** that will guide your decisions throughout this chapter and beyond.
This is the **Skill-First Learning Pattern**. Instead of learning a technology and then maybe creating a skill, you create the skill first—grounded in official documentation—and then refine it as you learn. By the end of this lesson, you'll have reusable intelligence that captures MergeKit's merging strategies, YAML configuration patterns, and RAM optimization techniques.
## Why Skill-First for Model Merging?
Model merging has a deceptively simple surface: "combine two models into one." But beneath that simplicity lies crucial decisions:
| Decision | Wrong Choice Impact |
|----------|-------------------|
| **Merge strategy** | Wrong strategy = capability interference, degraded performance |
| **Layer handling** | Incorrect ranges = corrupted model weights |
| **RAM management** | No sharding = out-of-memory crashes on consumer hardware |
| **Weight ratios** | Imbalanced mixing = one capability dominates |
A skill captures this decision framework. When you encounter a merging scenario six months from now, you won't rely on memory—you'll invoke your skill.
## Step 1: Clone Your Skills Lab Fresh
Every chapter starts clean. No assumptions about prior state.
```bash
# Navigate to your workspace
cd ~/workspace
# Clone or reset skills-lab (use your preferred approach)
git clone https://github.com/your-org/skills-lab.git ch67-skills-lab
cd ch67-skills-lab
```
**Output:**
```
Cloning into 'ch67-skills-lab'...
remote: Enumerating objects: 245, done.
remote: Counting objects: 100% (245/245), done.
Receiving objects: 100% (245/245), 89.42 KiB | 1.29 MiB/s, done.
```
Alternatively, if you're working in an existing repo:
```bash
# Create a fresh directory for Chapter 67
mkdir -p ~/workspace/ch67-model-merging
cd ~/workspace/ch67-model-merging
```
## Step 2: Write Your LEARNING-SPEC
Before fetching documentation, articulate what you want to learn and how you'll know you've succeeded. This prevents aimless reading.
Create `LEARNING-SPEC.md`:
```markdown
# LEARNING-SPEC: Model Merging
## What I Want to Learn
- How to combine multiple LoRA adapters into a single model
- Which merging strategies exist (TIES, SLERP, DARE) and when to use each
- How to handle RAM constraints on consumer hardware (12GB limit)
- MergeKit YAML configuration patterns
## Why This Matters
I have two trained adapters:
1. TaskMaster persona adapter (distinctive voice)
2. Agentic tool-calling adapter (reliable JSON output)
I need to combine them without losing either capability. Wrong strategy choice
means one capability dominates or both degrade.
## Success Criteria
1. [ ] Skill can recommend appropriate merge strategy for a given scenario
2. [ ] Skill includes YAML configuration templates for common cases
3. [ ] Skill explains RAM optimization techniques for 12GB constraint
4. [ ] Skill distinguishes when to merge vs. when to retrain combined
## Source Documents
- MergeKit GitHub repository documentation
- Arcee AI blog posts on merging techniques
- HuggingFace model merging guides
```
**Why write this first?** Without a spec, you'll read documentation passively. With a spec, you read actively—hunting for answers to YOUR questions.
## Step 3: Fetch MergeKit Documentation
Now invoke your documentation-fetching skill. In Claude Code:
```
/fetching-library-docs MergeKit model merging
```
Or manually gather from the official repository:
```bash
# MergeKit GitHub: https://github.com/arcee-ai/mergekit
# Key documentation:
# - README.md for installation and basic usage
# - docs/ for strategy explanations
# - examples/ for YAML configuration templates
```
### Key MergeKit Concepts
From the official documentation, extract these core patterns:
**Supported Merge Methods:**
| Method | Best For | How It Works |
|--------|----------|--------------|
| **linear** | Simple averaging | Weighted average of model parameters |
| **slerp** | Two similar models | Spherical linear interpolation preserving geometric properties |
| **ties** | Multiple distinct capabilities | Trim-Elect-Sign: removes redundant params, resolves sign conflicts |
| **dare_ties** | Complementary skills | Drop and rescale + TIES conflict resolution |
| **passthrough** | Layer extraction | Copy specific layers without modification |
**YAML Configuration Structure:**
```yaml
merge_method: ties
slices:
- sources:
- model: ./adapter_1
layer_range: [0, 32]
- model: ./adapter_2
layer_range: [0, 32]
parameters:
weight: 0.5 # per-source weight
density: 0.5 # for TIES/DARE methods
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16
```
**RAM Optimization:**
```yaml
# Sharded merging for limited RAM
merge_method: ties
slices:
# Process layers in batches
- sources:
- model: ./adapter_1
layer_range: [0, 8]
- model: ./adapter_2
layer_range: [0, 8]
- sources:
- model: ./adapter_1
layer_range: [8, 16]
- model: ./adapter_2
layer_range: [8, 16]
# ... continue for remaining layers
```
## Step 4: Create Your Model-Merging Skill
Now synthesize what you've learned into a reusable skill.
Create `.claude/skills/model-merging/SKILL.md`:
```markdown
---
name: model-merging
description: "This skill should be used when combining multiple LoRA adapters or fine-tuned models into a single unified model. Use when students have trained separate adapters for different capabilities (persona, tool-calling, domain knowledge) and need to merge them without losing functionality."
---
# Model Merging Skill
## When to Use This Skill
Invoke when you need to:
- Combine multiple LoRA adapters into one model
- Merge fine-tuned models with complementary capabilities
- Optimize merged model for RAM-constrained environments
- Decide between merging strategies (TIES, SLERP, DARE)
## Decision Framework: Merge vs. Retrain
Before merging, consider:
| Question | If Yes | If No |
|----------|--------|-------|
| Are adapters trained on overlapping data? | Retrain combined | Safe to merge |
| Do capabilities interfere (conflicting outputs)? | Retrain with multi-task | Merge is viable |
| Is one adapter significantly larger/dominant? | Consider weighted merge | Standard merge |
| Do you have compute budget for retraining? | Consider both options | Merge is only option |
## Strategy Selection Guide
### Use SLERP When:
- Merging exactly 2 models
- Models are similar (same base, similar data)
- You want smooth interpolation between behaviors
### Use TIES When:
- Merging 2+ models with distinct capabilities
- Models may have parameter conflicts
- You want to preserve strongest signals from each
### Use DARE-TIES When:
- Merging complementary skills
- Adapter parameters are mostly redundant
- You want aggressive compression (drop 90%+ parameters)
### Use Linear When:
- Simple weighted average is sufficient
- Quick baseline before trying advanced methods
## YAML Configuration Templates
### Two-Adapter Merge (TIES)
```yaml
merge_method: ties
slices:
- sources:
- model: ./persona_adapter
layer_range: [0, 32]
- model: ./agentic_adapter
layer_range: [0, 32]
parameters:
weight: 0.5
density: 0.5
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16
```
### RAM-Optimized Sharded Merge
```yaml
merge_method: ties
# Process 8 layers at a time for 12GB RAM
slices:
- sources:
- model: ./adapter_1
layer_range: [0, 8]
- model: ./adapter_2
layer_range: [0, 8]
- sources:
- model: ./adapter_1
layer_range: [8, 16]
- model: ./adapter_2
layer_range: [8, 16]
- sources:
- model: ./adapter_1
layer_range: [16, 24]
- model: ./adapter_2
layer_range: [16, 24]
- sources:
- model: ./adapter_1
layer_range: [24, 32]
- model: ./adapter_2
layer_range: [24, 32]
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16
```
### Weighted Merge (Favor One Adapter)
```yaml
merge_method: ties
slices:
- sources:
- model: ./persona_adapter
layer_range: [0, 32]
parameters:
weight: 0.7 # Favor persona
- model: ./agentic_adapter
layer_range: [0, 32]
parameters:
weight: 0.3
parameters:
density: 0.5
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16
```
## Evaluation Checklist
After merging, verify:
- [ ] Persona trait consistency (compare to persona-only model)
- [ ] Tool-calling accuracy (compare to agentic-only model)
- [ ] No capability regression (both should work)
- [ ] RAM usage within constraints
- [ ] Inference latency acceptable
## Common Pitfalls
| Pitfall | Solution |
|---------|----------|
| OOM during merge | Use sharded layer processing |
| Capability loss | Try TIES with higher density |
| One adapter dominates | Adjust per-source weights |
| Inconsistent outputs | Evaluate base model compatibility |
## Resources
- MergeKit: https://github.com/arcee-ai/mergekit
- TIES Paper: Yadav et al., 2023
- DARE Paper: Yu et al., 2023
```
## Step 5: Verify Your Skill
Test the skill by invoking it with a real scenario.
**Test prompt:**
```
I have two LoRA adapters:
1. Persona adapter (200 examples, casual voice)
2. Tool-calling adapter (500 examples, strict JSON output)
Both trained on Llama-3-8B. I have 12GB RAM available.
Which merge strategy should I use and why?
```
**Expected skill-informed response:**
The skill should recommend:
1. **Strategy**: TIES (distinct capabilities, potential parameter conflicts)
2. **Weight balance**: Consider 0.4/0.6 favoring tool-calling (more training data)
3. **RAM handling**: Sharded merge processing 8 layers at a time
4. **Verification**: Test both persona consistency and JSON accuracy post-merge
If your skill provides this guidance, you've succeeded.
## Step 6: Update Your LEARNING-SPEC
Return to your specification and check off what you've accomplished:
```markdown
## Success Criteria
1. [x] Skill can recommend appropriate merge strategy for a given scenario
2. [x] Skill includes YAML configuration templates for common cases
3. [x] Skill explains RAM optimization techniques for 12GB constraint
4. [x] Skill distinguishes when to merge vs. when to retrain combined
```
## What You've Built
Your `model-merging` skill now captures:
| Component | Content |
|-----------|---------|
| **Decision framework** | Merge vs. retrain criteria |
| **Strategy selection** | When to use TIES/SLERP/DARE |
| **Configuration templates** | Ready-to-use YAML |
| **RAM optimization** | Sharded merging patterns |
| **Evaluation checklist** | Post-merge verification |
This skill will guide your work throughout the remaining lessons and serve you in future projects.
## Try With AI
Use your AI companion to extend and validate your skill.
### Prompt 1: Challenge Your Decision Framework
```
Review the "Merge vs. Retrain" decision framework in my model-merging skill.
I'm concerned it might be too simplistic. Ask me challenging questions:
1. What if adapters have partial data overlap (30%)?
2. What if capabilities are complementary but use conflicting base models?
3. What if I need to merge 3+ adapters, not just 2?
Help me identify gaps in my decision framework and suggest improvements.
```
**What you're learning**: Critical evaluation of your own skill—developing the meta-skill of improving reusable intelligence through adversarial questioning.
### Prompt 2: Generate Edge Case Templates
```
My model-merging skill has templates for common cases, but I'm worried about
edge cases. Help me create YAML templates for:
1. Merging adapters with different LoRA ranks (r=16 vs r=32)
2. Merging an adapter with the base model itself (not two adapters)
3. Merging when one adapter is much larger (10x parameters)
For each, explain what could go wrong and how the template addresses it.
```
**What you're learning**: Template generalization—expanding your skill to handle scenarios beyond the happy path.
### Prompt 3: Prepare for the Chapter
```
I'm about to learn model merging in depth (Lessons 1-6). Looking at my
current model-merging skill, what concepts am I missing that I'll likely
need to add? Consider:
- Mathematical foundations I haven't captured
- Failure modes I haven't documented
- Evaluation metrics beyond my checklist
Don't explain these now—just list what I should watch for as I learn,
so I can update my skill as I go.
```
**What you're learning**: Proactive learning—identifying knowledge gaps before encountering them, turning passive learning into active skill refinement.
### Safety Note
Your skill is grounded in official MergeKit documentation, but merging techniques evolve. Before using configurations from your skill in production, verify against the current MergeKit repository. Model merging can produce unexpected behaviors—always evaluate merged models thoroughly before deployment.Related Skills
ios-foundation-models-diag
Use when debugging Foundation Models issues — context exceeded, guardrail violations, slow generation, availability problems, unsupported language, or unexpected output. Systematic diagnostics with production crisis defense.
fair-data-model-assessment
Assess data models against FAIR principles using RDA-FDMM indicators. Use when: (1) Evaluating vendor-delivered data models for FAIR compliance, (2) Reviewing schemas, ontologies, or data dictionaries before integration, (3) Creating FAIR assessment reports for data governance reviews, (4) Preparing data model documentation for enterprise or regulatory standards, (5) Auditing existing data assets for FAIRness gaps. Covers 41 RDA indicators across Findable, Accessible, Interoperable, Reusable dimensions with maturity scoring (0-4 scale).
data-model
Generate comprehensive data model documentation with ERD, DTOs, and data flow diagrams
data-model-creation
Professional rules for AI-driven data modeling and creation. Use this skill when users need to create and manage MySQL databases, design data models using Mermaid ER diagrams, and implement database schemas.
building-with-llms
Help users build effective AI applications. Use when someone is building with LLMs, writing prompts, designing AI features, implementing RAG, creating agents, running evals, or trying to improve AI output quality.
building-agents
Expert at creating and modifying Claude Code agents (subagents). Auto-invokes when the user wants to create, update, modify, enhance, validate, or standardize agents, or when modifying agent YAML frontmatter fields (especially 'model', 'tools', 'description'), needs help designing agent architecture, or wants to understand agent capabilities. Also auto-invokes proactively when Claude is about to write agent files (*/agents/*.md), create modular agent architectures, or implement tasks that involve creating agent components.
Build Your Model Serving Skill
Create your model-serving skill from Ollama documentation before learning deployment theory
Build Your LLMOps Decision Skill
No description provided.
Build Your Data Engineering Skill
Create your LLMOps data engineering skill in one prompt, then learn to improve it throughout the chapter
axiom-foundation-models
Use when implementing on-device AI with Apple's Foundation Models framework — prevents context overflow, blocking UI, wrong model use cases, and manual JSON parsing when @Generable should be used. iOS 26+, macOS 26+, iPadOS 26+, axiom-visionOS 26+
avalonia-viewmodels-zafiro
Optimal ViewModel and Wizard creation patterns for Avalonia using Zafiro and ReactiveUI.
agent-model-selection
Guidelines for selecting appropriate AI model (Sonnet vs Haiku) based on task complexity, ensuring cost efficiency while maintaining quality. Use when assigning work.