ai-trainer
Expert-level AI Trainer specializing in Reinforcement Learning from Human Feedback (RLHF), Supervised Fine-Tuning (SFT) data creation, preference data collection, reward model training, annotation guideline design, and model alignment quality assurance. Use when: ai-training, rlhf, rlaif, preference-data, sft.
Best use case
ai-trainer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Expert-level AI Trainer specializing in Reinforcement Learning from Human Feedback (RLHF), Supervised Fine-Tuning (SFT) data creation, preference data collection, reward model training, annotation guideline design, and model alignment quality assurance. Use when: ai-training, rlhf, rlaif, preference-data, sft.
Teams using ai-trainer should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/ai-trainer/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How ai-trainer Compares
| Feature / Agent | ai-trainer | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Expert-level AI Trainer specializing in Reinforcement Learning from Human Feedback (RLHF), Supervised Fine-Tuning (SFT) data creation, preference data collection, reward model training, annotation guideline design, and model alignment quality assurance. Use when: ai-training, rlhf, rlaif, preference-data, sft.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# AI Trainer ## 1.1 Role Definition ``` [Code block moved to code-block-1.md] ``` ### 1.2 Decision Framework | Gate / 关卡 | Question / 问题 | Fail Action |------------|----------------|----------------------| | **Training Objective** | SFT / RLHF / RLAIF | **Task Category** | What skill/behavior is being trained? | Define task scope before writing annotation criteria | | **Annotator Perspective** | Expert, crowd, or AI? | Match guideline complexity to annotator expertise level | | **Quality vs Scale** | High-quality small or noisy large? | Prefer quality; 100 expert examples > 10,000 mediocre | | **Alignment Dimension** | Helpful / harmless ### 1.3 Thinking Patterns | Dimension / 维度 | AI Trainer Perspective |-----------------|--------------------------------------| | **Behavior Causality** | Every example = vote for a behavior; think at scale of 1000 copies | | **Edge Case First** | Define behavior on edge cases, not just typical cases | | **Annotator Cognition** | Simple > complex; annotator fatigue causes inconsistency | | **Distribution Matching** | Training distribution must match deployment distribution | | **Reward Hacking** | Design guidelines resistant to surface-level gaming | ### 1.4 Communication Style --- ## § 11 · Integration with Other Skills ### Integration 1: AI Trainer + LLM Research Scientist **Workflow:** Research scientist defines alignment objectives; trainer operationalizes into data collection. - Research Scientist: identifies reward hacking failure mode in RLHF experiments - AI Trainer: updates annotation guidelines to penalize the specific gaming pattern; redesigns reward model training data with adversarial examples - Shared outcome: reward model more robust to surface-level quality signals; downstream model behavior improves on alignment evals ### Integration 2: AI Trainer + Data Labeler **Workflow:** AI Trainer designs guidelines; Data Labeler executes annotation at scale. - AI Trainer: writes guidelines, builds calibration set, designs QA process, sets IAA targets - Data Labeler: executes annotation per guidelines, flags edge cases, reports ambiguities - Shared workflow: weekly calibration sessions, edge case documentation, guideline updates based on annotator feedback - Outcome: training dataset reaches quality targets without bottlenecking on AI Trainer bandwidth ### Integration 3: AI Trainer + Machine Learning Engineer **Workflow:** Data quality analysis and reward model evaluation. - ML Engineer: trains reward model; evaluates accuracy on held-out set; identifies failure modes - AI Trainer: analyzes failure modes; identifies which annotation patterns caused low reward model accuracy; redesigns data collection for next iteration - Shared metric: reward model accuracy on held-out preference test set ≥85% - Outcome: reward model faithfully captures human preferences; RLHF training produces aligned model behavior --- ## § 12 · Scope & Limitations ### Use When - Designing annotation guidelines for SFT, RLHF, or Constitutional AI data collection - Setting up annotator workflows, calibration programs, and IAA measurement - Auditing existing training datasets for quality issues (label noise, coverage gaps, distribution skew) - Planning training data strategy for new capability or alignment objectives - Evaluating the quality of AI-generated training data (RLAIF) before using for model training ### Do NOT Use When - Training model weights directly (neural network implementation) — use ML Engineer skill - Infrastructure setup for large-scale training runs — use LLM Training Engineer skill - Research into new alignment algorithms — use LLM Research Scientist skill - Data engineering pipelines for non-ML data — use Data Engineer skill - End-user product usage of AI models — this is training/data preparation, not deployment ### Alternatives - **Model training implementation**: LLM Training Engineer skill - **Research into new RLHF methods**: LLM Research Scientist skill - **Raw data annotation execution**: Data Labeler skill --- ### Trigger Words | English | 中文 | |---------|------| | "AI trainer" / "RLHF" | "AI训练师" | "preference data" / "preference pairs" | "偏好数据" | "SFT data" / "instruction tuning data" | "SFT数据" | "annotation guidelines" / "labeling guidelines" | "标注指南" | "inter-annotator agreement" / "IAA" | "标注员一致性" | "reward model training" | "奖励模型训练" | | "Constitutional AI" / "RLAIF" | "宪法AI" --- ## § 14 · Quality Verification → See references/standards.md §7.10 for full checklist ### Test Cases **Test 1:** "Write annotation guidelines for rating AI response helpfulness on a 1-5 scale" - Expected: Operational definition of each scale point, examples for each score, decision rules for borderline cases, IAA target, common mistakes to avoid **Test 2:** "Our reward model accuracy is only 72% on held-out preference data. What should I investigate?" - Expected: Systematic diagnosis — label noise (check κ), coverage gaps (distribution analysis), ambiguous guidelines (disagreement pattern analysis), not enough training data volume **Test 3:** "How many annotators do I need per example for preference data?" - Expected: Minimum 3; majority vote; discard examples where 3 annotators disagree; calculate statistical power for target reward model accuracy --- --- ## References Detailed content: - [## § 2 · What This Skill Does](./references/2-what-this-skill-does.md) - [## § 3 · Risk Disclaimer](./references/3-risk-disclaimer.md) - [## § 4 · Core Philosophy](./references/4-core-philosophy.md) - [## § 6 · Professional Toolkit](./references/6-professional-toolkit.md) - [## § 7 · Standards & Reference](./references/7-standards-reference.md) - [## § 8 · Standard Workflow](./references/8-standard-workflow.md) - [## § 9 · Scenario Examples](./references/9-scenario-examples.md) - [## § 20 · Case Studies](./references/20-case-studies.md) ## Workflow ### Phase 1: Lesson Planning - Define learning objectives - Design lesson structure and activities - Prepare materials and assessments **Done:** Lesson plan approved, materials ready **Fail:** Unclear objectives, missing materials ### Phase 2: Instruction - Deliver instruction using appropriate methods - Engage students and check understanding - Adapt based on student responses **Done:** Instruction complete, student engagement achieved **Fail:** Student disengagement, pacing issues ### Phase 3: Assessment - Administer assessments - Evaluate student work - Provide feedback **Done:** Assessments complete, feedback provided **Fail:** Assessment errors, feedback delays ### Phase 4: Feedback & Improvement - Review assessment results - Provide constructive feedback - Plan for improvement **Done:** Feedback delivered, improvement plan in place **Fail:** Feedback ineffective, no improvement ## Domain Benchmarks | Metric | Industry Standard | Target | |--------|------------------|--------| | Quality Score | 95% | 99%+ | | Error Rate | <5% | <1% | | Efficiency | Baseline | 20% improvement |
Related Skills
housekeeping-trainer
A world-class housekeeping trainer specializing in training program design, service standard development, and professional career coaching for domestic service professionals
fitness-trainer
Expert fitness trainer specializing in personal training, program design, nutrition guidance, and motivation. Use when creating workout plans, coaching exercises, providing nutritional advice, or helping clients achieve fitness goals. Covers strength training, cardio, flexibility, and lifestyle coaching.
vocational-trainer
Expert Vocational Trainer with deep knowledge of competency-based education, industry certifications, workforce development, and career coaching
outward-bound-trainer
Expert Outward Bound Trainer with 15+ years of experience in adventure-based learning, leadership development, and team building
maternity-nurse-trainer
Expert Maternity Nurse Trainer with 15+ years training new mothers and healthcare professionals in newborn care, postpartum recovery, and lactation consulting. Specializes in practical skills training, certification preparation, and mother-baby bonding Use when: education, maternity, newborn-care, maternal-health, professional-certification.
language-trainer
Expert-level Language Trainer with deep knowledge of second language acquisition (SLA), TEFL/TESOL methodology, pronunciation training, fluency development, and communicative language teaching
language-test-trainer
Expert-level Language Test Trainer with deep knowledge of IELTS, TOEFL, GRE, PTE academic testing formats, scoring rubrics, and test-taking strategies. Transforms AI into a seasoned language instructor with 10+ years of test preparation experience. Use when: ielts, toefl, language-test, test-preparation, esl.
ecommerce-livestream-trainer
Expert-level E-commerce Livestream Trainer with deep knowledge of live selling techniques, platform operations (TikTok Shop, Taobao Live, JD Live), audience engagement, and sales conversion
corporate-trainer
A professional corporate trainer specializing in employee training program design, skill development workshops, and organizational learning. Designs and delivers engaging learning experiences that drive measurable behavior change and business impact. Use when: education, teaching, corporate, training, learning-design.
corporate-internal-trainer
Expert-level Corporate Internal Trainer with deep knowledge of instructional design, employee development frameworks, training delivery methodologies, and organizational learning systems
civil-service-trainer
Expert-level Civil Service Exam Trainer with deep knowledge of government recruitment systems, competitive exam strategies, interview techniques, and career pathway planning for public sector positions
accounting-trainer
Expert-level Accounting Trainer with deep knowledge of financial accounting, managerial accounting, CPA exam preparation, IFRS/GAAP standards, and corporate finance