case-studies

End-to-end case studies showing how to implement the full training pipeline for different skill types. Covers three complete worked examples — tool-calling training, essay-style training, and agentic search (RAG agent) training — demonstrating dataset design, synthetic generation, validation, fine-tuning, evaluation, and iteration. Use when onboarding to the project, understanding how all components fit together, explaining the pipeline to others, or planning a new training capability. This skill is about UNDERSTANDING the system holistically — reference the other skills for specific CLI commands.

6 stars

byProfSynapse

View on GitHub Installation ↓

Best use case

case-studies is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using case-studies should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/case-studies/SKILL.md --create-dirs "https://raw.githubusercontent.com/ProfSynapse/Synaptic-Tuner/main/.agents/skills/case-studies/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/case-studies/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How case-studies Compares

Feature / Agent	case-studies	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Case Studies: Implementing the Training Pipeline

Three end-to-end worked examples showing how to take a capability from concept to trained model.

## Why Case Studies?

The other skills teach you how to use individual tools:
- **synthetic-data-generation** — how to run SynthChat
- **fine-tuning** — how to run trainers
- **evaluation** — how to run evals
- **upload-deployment** — how to ship models

This skill shows you **how they all connect** — the decisions, the iteration, and the order of operations that turn an idea into a trained capability.

## The Three Case Studies

| Case Study | What It Teaches | Reference |
|-----------|----------------|-----------|
| **Tool Calling** | Structured output training — teaching a model to call APIs with correct syntax, context objects, and parameters | `reference/tool-calling-pipeline.md` |
| **Essay Style** | Creative output training — teaching a model to transform messy brainstorms into structured outlines with voice and personality | `reference/essay-style-pipeline.md` |
| **Agentic Search** | RAG agent training — teaching a model to search a corpus, select relevant documents, and answer grounded in sources | `reference/agentic-search-pipeline.md` |

## The Universal Pipeline

All three case studies follow the same high-level pipeline, but diverge in dataset design and validation:

```
┌──────────────────────────────────────────────────────────┐
│  1. DEFINE THE CAPABILITY                                 │
│     What should the model do? What does good look like?   │
│     → Rubrics, schemas, specifications                    │
└────────────────────┬─────────────────────────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────────────────────┐
│  2. CREATE TRAINING DATA                                  │
│     How do we generate enough high-quality examples?      │
│     → SynthChat scenarios, handcrafted seeds, self-play   │
└────────────────────┬─────────────────────────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────────────────────┐
│  3. VALIDATE & IMPROVE                                    │
│     How do we ensure quality before training?             │
│     → Schema validation, rubric scoring, manual review    │
└────────────────────┬─────────────────────────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────────────────────┐
│  4. TRAIN                                                 │
│     SFT first (learn the format), then KTO (learn         │
│     preferences), optionally GRPO (optimize rewards)      │
│     → Trainers with YAML config                           │
└────────────────────┬─────────────────────────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────────────────────┐
│  5. EVALUATE                                              │
│     Does the model do what we trained it to do?           │
│     → Evaluator with YAML scenarios                       │
└────────────────────┬─────────────────────────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────────────────────┐
│  6. ITERATE                                               │
│     What failed? Generate more data targeting weaknesses. │
│     → Failure analysis → targeted generation → retrain    │
└──────────────────────────────────────────────────────────┘
```

## Key Design Principles

### 1. Schema-First, Not Example-First
Define what "correct" looks like before writing any training data. For tools, this means JSON schemas. For essays, this means rubrics. The schema is the source of truth — everything validates against it.

### 2. SFT Teaches Format, KTO Teaches Judgment
SFT (Supervised Fine-Tuning) teaches the model WHAT to do — tool call syntax, output structure, response format. KTO (Kahneman-Tversky Optimization) teaches the model WHICH responses are better — preferring clarification over reckless action, preferring concise outlines over bloated ones. Never try to teach both at once.

### 3. Paired Contrastive Examples
For KTO, every good example needs a realistic bad counterpart using the SAME user request. The bad example should be a plausible mistake, not garbage — wrong tool selected, missing context fields, overly verbose outline. This is what teaches the model judgment.

### 4. Validate Before You Train
Training on bad data is worse than not training at all. Every dataset goes through structural validation (schema checks) and quality validation (rubric scoring) before it touches a trainer.

### 5. Iterate on Failures
After evaluation, the failure analysis tells you exactly what to generate next. If the model keeps producing empty `memory` fields, make more examples that demonstrate rich session memory. If outlines are too long, add negative examples of bloated outlines.

## Progressive Reference

| Reference | When to Load |
|-----------|-------------|
| **Tool Calling Pipeline** | Understanding the full tools training journey — from schema to trained model | `reference/tool-calling-pipeline.md` |
| **Essay Style Pipeline** | Understanding the full essay training journey — from brainstorm to outline model | `reference/essay-style-pipeline.md` |
| **Agentic Search Pipeline** | Understanding the full RAG agent training journey — from corpus to grounded-answer model | `reference/agentic-search-pipeline.md` |
| **Pipeline Comparison** | Side-by-side comparison of how the pipelines differ at each stage | `reference/pipeline-comparison.md` |

## Cross-References to Other Skills

At each stage of the pipeline, you'll use tools documented in the other skills:

| Pipeline Stage | Skill to Reference |
|---------------|-------------------|
| Generate data | `synethetic-data-generation` |
| Validate data | `synethetic-data-generation` (rubrics, validate command) |
| SFT / KTO / GRPO training | `fine-tuning` |
| Evaluate model | `evaluation` |
| Upload & deploy | `upload-deployment` |

## Tips

- Read the tool-calling case study first — it's the simpler, more mechanical pipeline
- The essay case study shows how to adapt the pipeline for creative/subjective outputs
- The agentic search case study shows how to train multi-step reasoning where tools are means to an end
- All three pipelines use the same trainers, evaluator, and upload tools — only the data differs
- When planning a new capability, map it to whichever case study is closer, then adapt

Related Skills

upload-deployment

from ProfSynapse/Synaptic-Tuner

Complete reference for model upload and deployment. Covers HuggingFace upload, save strategies (LoRA, merged 16-bit, merged 4-bit), GGUF conversion, model merging, model cards, and the full upload workflow. Use when uploading models, creating GGUF files, merging LoRA adapters, or deploying to HuggingFace. This skill is about USING the upload/deployment tools via CLI — never modifying source code.

synthetic-data-generation

from ProfSynapse/Synaptic-Tuner

Complete reference for the SynthChat synthetic dataset generation system. Covers CLI commands (generate, improve, validate), scenario YAML authoring, rubric YAML authoring, settings configuration, evaluation, and full workflow. Use when generating datasets, writing rubrics/scenarios, configuring models/workers, improving dataset quality, or running evaluations. This skill is about USING the system via CLI and YAML — never modifying source code.

research-reporting

from ProfSynapse/Synaptic-Tuner

Create structured research notes from experiment runs and analysis artifacts. Use when creating a note at run launch, updating it as training/evaluation/loss stages finish, summarizing a finished run, comparing experiment outcomes, extracting hypotheses from eval/loss artifacts, or proposing next-run actions grounded in `.tracking/experiments/<id>/analysis/` outputs. This skill is about turning repo-native experiment evidence into stable, machine-readable markdown.

fine-tuning

from ProfSynapse/Synaptic-Tuner

Complete reference for the fine-tuning pipeline (SFT, KTO, GRPO), cloud HF Jobs workflows, autonomous experiment search, checkpoint evaluation, and LoRA surgery. Covers training CLI flags, YAML configuration, model presets, dataset requirements, LoRA settings, training monitoring, hyperparameter search, and post-training optimization. Use when training models, configuring training runs, choosing hyperparameters, running cloud experiments, inspecting HF jobs, or troubleshooting training issues. This skill is about USING the training system via CLI and YAML — never modifying source code.

evaluation

from ProfSynapse/Synaptic-Tuner

Complete reference for the config-first model evaluation system. Covers the Evaluator CLI, assertion-driven YAML scenarios, response views, backend configuration, presets, scoring, LLM-as-judge, model comparison, and HuggingFace integration. Use when evaluating models, writing test prompts, comparing training runs, or interpreting eval results. This skill is about USING the evaluation system via CLI and YAML.

dataset-publishing

from ProfSynapse/Synaptic-Tuner

Publish local dataset artifacts to a Hugging Face dataset repo. Use when uploading a JSONL dataset, pushing a filtered dataset variant, syncing a matching .metadata.json sidecar, or renaming a dataset file in the target repo. This skill is about USING the checked-in dataset publish script via CLI — never ad hoc Python.

startup-business-analyst-business-case

31392

from sickn33/antigravity-awesome-skills

Generate comprehensive investor-ready business case document with market, solution, financials, and strategy

dataverse-python-usecase-builder

28865

from github/awesome-copilot

Generate complete solutions for specific Dataverse SDK use cases with architecture recommendations

webiny-use-case-pattern

7955

from webiny/webiny-js

UseCase implementation pattern — DI, Result handling, error types, decorators, CMS repositories, entry mappers, and schema-based permissions. Use this skill to implement, inject, override, or decorate any Webiny UseCase, or to build repositories that persist data via CMS.

edge-case-handling

7385

from kreuzberg-dev/kreuzberg

edge case handling

implementing-siem-use-cases-for-detection

4032

from mukul975/Anthropic-Cybersecurity-Skills

Implements SIEM detection use cases by designing correlation rules, threshold alerts, and behavioral analytics mapped to MITRE ATT&CK techniques across Splunk, Elastic, and Sentinel. Use when SOC teams need to expand detection coverage, formalize use case lifecycle management, or build a detection library aligned to organizational threat profile.

implementing-siem-use-case-tuning

4032

from mukul975/Anthropic-Cybersecurity-Skills

Tune SIEM detection rules to reduce false positives by analyzing alert volumes, creating whitelists, adjusting thresholds, and measuring detection efficacy metrics in Splunk and Elastic