Build Your Agentic Tuning Skill

181 stars

Best use case

Build Your Agentic Tuning Skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using Build Your Agentic Tuning Skill should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/66-agentic-function-calling/SKILL.md --create-dirs "https://raw.githubusercontent.com/majiayu000/claude-skill-registry/main/skills/data/66-agentic-function-calling/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/66-agentic-function-calling/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How Build Your Agentic Tuning Skill Compares

Feature / Agent	Build Your Agentic Tuning Skill	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

This skill provides specific capabilities for your AI agent. See the About section for full details.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

SKILL.md Source

# Build Your Agentic Tuning Skill

Before diving into tool-calling patterns, structured outputs, and JSON accuracy metrics, you'll build the skill that will guide your learning throughout this chapter. This isn't just preparation—it's the first step in creating a reusable asset you'll refine as you progress.

By the end of this lesson, you'll have a working `agentic-tuning` skill grounded in official documentation. Each subsequent lesson will test and improve this skill, transforming your learning into accumulated intelligence.

## Why Skill-First for Agentic Tuning?

Agentic fine-tuning involves specialized knowledge that's rapidly evolving:

- **Function-calling formats**: OpenAI, Claude, and open-source models use different tool-call schemas
- **Structured output training**: Special tokens, JSON schemas, validation techniques
- **SDK integration patterns**: How to connect fine-tuned models to agent frameworks
- **Evaluation metrics**: Tool accuracy, argument validation, end-to-end success rates

Building a skill from official documentation ensures you're working with current, accurate patterns rather than relying on potentially outdated training data.

## Step 1: Clone Skills-Lab Fresh

Every chapter starts with a clean environment. No assumptions about prior state.

```bash
# Navigate to your projects directory
cd ~/projects

# Clone fresh (or pull latest if exists)
git clone https://github.com/panaversity/skills-lab.git ch66-agentic-tuning
cd ch66-agentic-tuning

# Verify structure
ls -la
```

**Output:**
```
drwxr-xr-x  skills/
drwxr-xr-x  specs/
drwxr-xr-x  data/
-rw-r--r--  README.md
-rw-r--r--  requirements.txt
```

Create your chapter workspace:

```bash
mkdir -p skills/agentic-tuning
mkdir -p specs/task-agent-backend
mkdir -p data/tool-calling
```

## Step 2: Write Your LEARNING-SPEC.md

The LEARNING-SPEC defines what you're learning, why it matters, and how you'll know you've succeeded.

Create `specs/task-agent-backend/LEARNING-SPEC.md`:

```markdown
# LEARNING-SPEC: Agentic Fine-Tuning for Task API

## Intent

Fine-tune a language model to reliably call Task API tools (create_task,
update_task, complete_task, list_tasks) with 95%+ JSON accuracy, enabling
use as a drop-in replacement for GPT-4 in OpenAI Agents SDK workflows.

## Why This Matters

- **Cost reduction**: GPT-4 costs $10K+/month at high volume; custom model ~$300
- **Latency improvement**: <500ms vs ~1s for API calls
- **Full control**: Train on proprietary data, deploy on your infrastructure
- **Differentiation**: Competitors can't replicate your specialized agent

## Success Criteria

1. **Tool-calling accuracy > 95%**: Model selects correct tool for given intent
2. **Valid JSON output > 99%**: All outputs parse without errors
3. **Argument accuracy > 90%**: Parameters match expected schema
4. **SDK compatibility**: Works as drop-in replacement in OpenAI Agents SDK
5. **Latency < 500ms**: Acceptable response time on consumer hardware

## What I'll Learn

- [ ] Structured output fundamentals (why JSON matters for agents)
- [ ] Tool-calling data patterns (OpenAI format, multi-turn conversations)
- [ ] Dataset creation for tool-calling (synthetic generation, validation)
- [ ] Fine-tuning configuration (special tokens, loss masking)
- [ ] Evaluation framework (tool accuracy, end-to-end testing)
- [ ] SDK integration (LiteLLM, OpenAI Agents SDK compatibility)
- [ ] Error handling (graceful degradation, retry strategies)

## Non-Goals

- Building a general-purpose assistant (we focus on Task API only)
- Achieving human-level reasoning (we want reliable tool execution)
- Supporting arbitrary tool schemas (we optimize for our four tools)
- Production deployment (covered in Chapter 70)

## Prerequisites

- Chapter 65: Persona-tuned Task API model
- Chapter 64: Understanding of SFT workflow
- Part 6: OpenAI Agents SDK familiarity
```

## Step 3: Fetch Official Documentation

Use your AI assistant with Context7 or web fetch to gather authoritative sources.

**Prompt to AI:**

```
Fetch the official OpenAI documentation on function calling format.
I need to understand:
1. The exact JSON schema for function definitions
2. How tool calls appear in chat completion responses
3. The format for tool_choice parameter
4. How multi-turn conversations with tool calls work

Focus on the format used for fine-tuning, not just API usage.
```

**Key patterns you'll discover:**

```json
// Function definition format
{
  "type": "function",
  "function": {
    "name": "create_task",
    "description": "Create a new task with title, due date, and priority",
    "parameters": {
      "type": "object",
      "properties": {
        "title": {"type": "string", "description": "Task title"},
        "due_date": {"type": "string", "description": "Due date in YYYY-MM-DD"},
        "priority": {"type": "string", "enum": ["low", "medium", "high"]}
      },
      "required": ["title"]
    }
  }
}
```

```json
// Tool call in assistant response
{
  "role": "assistant",
  "content": null,
  "tool_calls": [
    {
      "id": "call_abc123",
      "type": "function",
      "function": {
        "name": "create_task",
        "arguments": "{\"title\": \"Review budget\", \"priority\": \"high\"}"
      }
    }
  ]
}
```

## Step 4: Create Your Agentic-Tuning Skill

Now build your initial skill based on what you've learned.

Create `skills/agentic-tuning/SKILL.md`:

```markdown
---
name: agentic-tuning
description: "This skill should be used when fine-tuning language models for reliable tool-calling and structured output generation. Use when creating agent backends, training for function calling, or building models that need 95%+ JSON accuracy."
---

# Agentic Tuning Skill

## Purpose

Guide fine-tuning workflows that produce models capable of reliable tool-calling
and structured output generation for use as agent backends.

## When to Use This Skill

Activate when:
- Training models for function calling / tool use
- Optimizing for structured JSON output
- Building custom backends for OpenAI Agents SDK
- Replacing expensive API calls with custom models
- Improving tool-calling accuracy in existing models

## Core Patterns

### 1. Tool Definition Format (OpenAI Compatible)

```json
{
  "type": "function",
  "function": {
    "name": "tool_name",
    "description": "Clear description of what tool does",
    "parameters": {
      "type": "object",
      "properties": {
        "param1": {"type": "string", "description": "..."},
        "param2": {"type": "integer", "description": "..."}
      },
      "required": ["param1"]
    }
  }
}
```

### 2. Training Data Format

Each example must include:
- System prompt with tool definitions
- User message with intent
- Assistant response with tool_call (not natural language)

```json
{
  "messages": [
    {"role": "system", "content": "You are a task assistant. Tools: [...]"},
    {"role": "user", "content": "Create a high priority task for budget review"},
    {"role": "assistant", "content": null, "tool_calls": [...]}
  ]
}
```

### 3. Evaluation Metrics

| Metric | Target | Measurement |
|--------|--------|-------------|
| Tool Selection Accuracy | >95% | Correct tool for intent |
| JSON Validity | >99% | Parses without error |
| Argument Accuracy | >90% | Schema-compliant parameters |
| End-to-End Success | >85% | Task actually completed |

## Decision Framework

### When to Fine-Tune vs. Prompt Engineer

Fine-tune when:
- You have 500+ tool-calling examples
- Current accuracy is below 90%
- Latency requirements demand smaller models
- Cost savings justify training investment

Prompt engineer when:
- You have fewer than 200 examples
- Base model achieves >95% accuracy with good prompts
- Requirements are still evolving rapidly

## Common Mistakes to Avoid

1. **Natural language instead of tool calls**: Assistant should return
   `tool_calls`, not descriptions like "I'll create that task for you"

2. **Missing tool definitions in system prompt**: Every training example
   must include the full tool schema

3. **Inconsistent argument formats**: Use ISO dates (YYYY-MM-DD),
   consistent enum values, proper types

4. **Single-turn only**: Include multi-turn examples with tool results

## References

- OpenAI Function Calling Guide: https://platform.openai.com/docs/guides/function-calling
- Fine-tuning for Function Calling: https://platform.openai.com/docs/guides/fine-tuning
- OpenAI Agents SDK: https://github.com/openai/openai-agents-python
```

## Step 5: Verify Your Skill

Test that your skill provides useful guidance:

**Prompt to AI:**

```
Read my agentic-tuning skill at skills/agentic-tuning/SKILL.md.
Now help me create a single training example for a user who says
"Add a task to call mom tomorrow, low priority."
Apply the patterns from the skill.
```

**Expected output structure:**

```json
{
  "messages": [
    {
      "role": "system",
      "content": "You are TaskMaster, a task management assistant. You have access to the following tools:\n\n[{\"type\": \"function\", \"function\": {\"name\": \"create_task\", ...}}]"
    },
    {
      "role": "user",
      "content": "Add a task to call mom tomorrow, low priority."
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "id": "call_001",
          "type": "function",
          "function": {
            "name": "create_task",
            "arguments": "{\"title\": \"Call mom\", \"due_date\": \"2024-01-16\", \"priority\": \"low\"}"
          }
        }
      ]
    }
  ]
}
```

If your skill produces examples like this, you're ready to proceed.

## Reflect on Your Skill

Before moving to Lesson 1, consider:

1. **What patterns did you extract?** Which documentation sources proved most valuable?

2. **What's still unclear?** Note questions to answer in upcoming lessons (multi-turn conversations, tool results, error handling).

3. **What would improve the skill?** You'll refine it as you learn more about evaluation metrics and SDK integration.

## Looking Ahead

Your `agentic-tuning` skill is now grounded in official documentation. In the lessons ahead, you'll:

- **Lesson 1**: Understand why structured output matters for agents
- **Lesson 2**: Learn tool-calling patterns in depth
- **Lesson 3**: Train for consistent structured outputs
- **Lesson 4**: Create Task API tool-calling datasets
- **Lesson 5**: Run your first agentic fine-tuning job
- **Lesson 6**: Handle multi-turn conversations
- **Lesson 7**: Evaluate tool accuracy systematically
- **Lesson 8**: Build the complete Task Agent Backend

Each lesson will test and improve your skill. By Chapter 66's end, you'll have a production-ready skill for any agentic fine-tuning project.

## Try With AI

Use your AI companion to enhance your skill.

### Prompt 1: Expand Tool Definitions

```
I'm building an agentic-tuning skill for Task API. I have four tools:
- create_task: Create new tasks with title, due date, priority
- update_task: Modify existing tasks by ID
- complete_task: Mark a task as done
- list_tasks: Get tasks filtered by status/priority/date

Help me write the complete JSON schema for each tool following
OpenAI's function calling format. Ask me clarifying questions
about the exact parameters each tool should accept.
```

**What you're learning**: Schema design through dialogue—your AI partner helps you think through edge cases (optional vs required params, enum values, date formats).

### Prompt 2: Identify Skill Gaps

```
Review my LEARNING-SPEC.md for agentic fine-tuning. What topics
am I likely missing? What common challenges do people face when
fine-tuning for tool-calling that I should add to my skill's
"Common Mistakes to Avoid" section?
```

**What you're learning**: Gap analysis—anticipating challenges before encountering them, improving your skill proactively.

### Prompt 3: Connect to Your Domain

```
I'm learning agentic fine-tuning with Task API as the example.
But my real goal is [describe your domain—customer support,
sales automation, data analysis, etc.]. Help me understand
how the patterns from this chapter apply to my domain. What
tools would MY agent need?
```

**What you're learning**: Pattern transfer—translating Task API examples to your actual use case.

### Safety Note

As you build your skill from documentation, verify key claims. AI can help synthesize patterns, but official docs are the source of truth for format specifications. When in doubt, test with the actual API.

Related Skills

admin-panel-builder

181

from majiayu000/claude-skill-registry

Expert assistant for creating and maintaining admin panel pages in the KR92 Bible Voice project. Use when creating admin pages, building admin components, integrating with admin navigation, or adding admin features.

adk-agent-builder

181

from majiayu000/claude-skill-registry

Build production-ready AI agents using Google's Agent Development Kit with AI assistant integration, React patterns, multi-agent orchestration, and comprehensive tool libraries. Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.

adb-builder

181

from majiayu000/claude-skill-registry

No description provided.

action-builder-skill

181

from majiayu000/claude-skill-registry

Use when creating or refactoring Nango integration actions to be thin API wrappers - provides patterns for minimal transformation logic, direct proxy calls, and standardized structure

acc-create-test-builder

181

from majiayu000/claude-skill-registry

Generates Test Data Builder and Object Mother patterns for PHP 8.5. Creates fluent builders with sensible defaults and factory methods for test data creation.

acc-create-builder

181

from majiayu000/claude-skill-registry

Generates Builder pattern for PHP 8.5. Creates step-by-step object construction with fluent interface and validation. Includes unit tests.

web-artifacts-builder

181

from majiayu000/claude-skill-registry

Suite of tools for creating elaborate, multi-component claude.ai HTML artifacts using modern frontend web technologies (React, Tailwind CSS, shadcn/ui). Use for complex artifacts requiring state management, routing, or shadcn/ui components - not for simple single-file HTML/JSX artifacts.

Build Your LiveKit Agents Skill

181

from majiayu000/claude-skill-registry

Create your LiveKit Agents skill from official documentation, then learn to improve it throughout the chapter

Build Your Agent Integration Skill

181

from majiayu000/claude-skill-registry

Create your agent-integration skill from OpenAI SDK and LiteLLM documentation before learning framework integration

Build Your Model Serving Skill

181

from majiayu000/claude-skill-registry

Create your model-serving skill from Ollama documentation before learning deployment theory

artifacts-builder

181

from majiayu000/claude-skill-registry

Build Your Evaluation Skill

181

from majiayu000/claude-skill-registry

Create a reusable skill for evaluating fine-tuned models, benchmarking performance, and detecting quality regressions