training-hub

Fine-tune LLMs using Red Hat training-hub library with SFT, LoRA, and OSFT algorithms. Use when preparing JSONL datasets, running training jobs, configuring hardware, scaling to clusters, evaluating models, or deploying with vLLM.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

training-hub is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using training-hub should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/training-hub/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/machine-learning/training-hub/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/training-hub/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How training-hub Compares

Feature / Agent	training-hub	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Training Hub

Red Hat's unified library for LLM post-training: SFT, LoRA, and OSFT (continual learning).

## Quick Reference

| Task | Command |
|------|---------|
| Recommend config | `python scripts/recommend_config.py --model <model> --hardware <hw>` |
| Estimate memory | `python scripts/estimate_memory.py --model <model> --method sft --hardware h100` |
| Validate dataset | `python scripts/validate_dataset.py data.jsonl` |
| Full fine-tuning | `from training_hub import sft` |
| LoRA training | `from training_hub import lora_sft` |
| OSFT (continual) | `from training_hub import osft` |

## Installation

```bash
pip install training-hub              # Basic
pip install training-hub[lora]        # LoRA with Unsloth (2x faster)
pip install training-hub[cuda] --no-build-isolation  # CUDA support
```

## Get Started Fast

```bash
# Get optimal config for your hardware
python scripts/recommend_config.py \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --hardware rtx-5090
```

## Data Format

Training data must be JSONL with message structure:

```json
{"messages": [{"role": "user", "content": "Q"}, {"role": "assistant", "content": "A"}]}
```

**Validate before training:**
```bash
python scripts/validate_dataset.py ./training_data.jsonl
```

For data preparation details, see [DATA-FORMATS.md](DATA-FORMATS.md).

## Training Methods

### Supervised Fine-Tuning (SFT)

Full-parameter fine-tuning. Requires significant VRAM.

```python
from training_hub import sft

result = sft(
    model_path="Qwen/Qwen2.5-7B-Instruct",
    data_path="./training_data.jsonl",
    ckpt_output_dir="./checkpoints",
    num_epochs=3,
    effective_batch_size=8,
    learning_rate=2e-5,
    max_seq_len=2048,
    max_tokens_per_gpu=45000,
)
```

### LoRA Fine-Tuning

Memory-efficient adaptation (up to 2x faster, 70% less VRAM):

```python
from training_hub import lora_sft

result = lora_sft(
    model_path="Qwen/Qwen2.5-7B-Instruct",
    data_path="./training_data.jsonl",
    ckpt_output_dir="./outputs",
    lora_r=16,
    lora_alpha=32,
    num_epochs=3,
    learning_rate=2e-4,
)
```

**QLoRA (4-bit):** Add `load_in_4bit=True` for large models on limited VRAM.

### OSFT (Continual Learning)

Adapt without catastrophic forgetting:

```python
from training_hub import osft

result = osft(
    model_path="meta-llama/Llama-3.1-8B-Instruct",
    data_path="./domain_data.jsonl",
    ckpt_output_dir="./checkpoints",
    unfreeze_rank_ratio=0.25,
    effective_batch_size=16,
    learning_rate=2e-5,
)
```

For all parameters, see [ALGORITHMS.md](ALGORITHMS.md).

## Hardware Support

| Hardware | VRAM | Best For |
|----------|------|----------|
| RTX 5090 | 32GB | 8B LoRA, 70B QLoRA |
| DGX Spark | 128GB | 70B SFT |
| H100 | 80GB | 14B SFT, 70B LoRA |
| 8×H100 | 640GB | 70B SFT |

```bash
# Check if your config fits
python scripts/estimate_memory.py \
  --model meta-llama/Llama-3.1-70B-Instruct \
  --method lora \
  --hardware h100 \
  --num-gpus 8
```

For hardware-specific configs, see [HARDWARE.md](HARDWARE.md).

## Scaling

**Multi-GPU:**
```python
result = sft(..., nproc_per_node=8)
```

**Multi-node:**
```python
result = sft(..., nnodes=2, node_rank=0, nproc_per_node=8, rdzv_endpoint="0.0.0.0:29500")
```

For Slurm, Kubernetes, and datacenter deployments, see [SCALE.md](SCALE.md).

## Algorithm Selection

| Scenario | Method |
|----------|--------|
| First-time fine-tuning, large dataset | SFT |
| Memory constrained | LoRA |
| Very large model (70B+), limited VRAM | LoRA + QLoRA |
| Preserve existing capabilities | OSFT |
| Domain adaptation, small dataset | OSFT |

## Documentation

| Topic | File |
|-------|------|
| Hardware profiles & configs | [HARDWARE.md](HARDWARE.md) |
| All algorithm parameters | [ALGORITHMS.md](ALGORITHMS.md) |
| Data formats & conversion | [DATA-FORMATS.md](DATA-FORMATS.md) |
| Datacenter & cluster setup | [SCALE.md](SCALE.md) |
| Model evaluation | [EVALUATION.md](EVALUATION.md) |
| vLLM inference & serving | [INFERENCE.md](INFERENCE.md) |
| Advanced techniques | [ADVANCED.md](ADVANCED.md) |
| Model-specific configs | [MODELS.md](MODELS.md) |
| Troubleshooting | [TROUBLESHOOTING.md](TROUBLESHOOTING.md) |
| Distributed training | [DISTRIBUTED.md](DISTRIBUTED.md) |

## Utility Scripts

| Script | Purpose |
|--------|---------|
| `recommend_config.py` | Generate optimal config for model + hardware |
| `estimate_memory.py` | Estimate GPU memory requirements |
| `validate_dataset.py` | Validate JSONL dataset format |
| `convert_to_jsonl.py` | Convert CSV, Alpaca, ShareGPT to JSONL |

## Troubleshooting

**CUDA OOM:** Reduce `max_tokens_per_gpu`, use LoRA + QLoRA, or add GPUs

**Dataset errors:** Run `python scripts/validate_dataset.py` first

**LoRA multi-GPU:** Requires `torchrun --nproc-per-node=N script.py`

**Training diverges:** Lower `learning_rate` (try 1e-5 for SFT, 1e-4 for LoRA)

For more, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md).

Related Skills

when-training-neural-networks-use-flow-nexus-neural

from diegosouzapw/awesome-omni-skill

This SOP provides a systematic workflow for training and deploying neural networks using Flow Nexus platform with distributed E2B sandboxes. It covers architecture selection, distributed training, ...

atft-training

from diegosouzapw/awesome-omni-skill

Run and monitor ATFT-GAT-FAN training loops, hyper-parameter sweeps, and safety modes on A100 GPUs.

ai-training-data-generation

from diegosouzapw/awesome-omni-skill

Generate high-quality training datasets from documents, text corpora, and structured content. Use when creating AI training data from dictionaries, documents, or when generating examples for machine learning models. Optimized for low-resource languages and domain-specific knowledge extraction.

qwen_training_data_miner_prototype

from diegosouzapw/awesome-omni-skill

Qwen Training Data Miner (Prototype)

account-aware-training

from diegosouzapw/awesome-omni-skill

Add account state (P&L, win rate, drawdown) to RL observations + drawdown penalty in rewards. Trigger when: (1) model needs account awareness, (2) training should penalize drawdowns, (3) upgrading obs_dim 5300→5600.

agentdb-reinforcement-learning-training

from diegosouzapw/awesome-omni-skill

AgentDB Reinforcement Learning Training operates on 3 fundamental principles:

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

customer-discovery

from diegosouzapw/awesome-omni-skill

Find where potential customers discuss problems online and extract their language patterns. Provides starting points for community research, not exhaustive coverage.

create-prd

from diegosouzapw/awesome-omni-skill

This skill should be used when the user asks to "创建PRD", "写产品需求文档", "生成PRD", "新建PRD", "create PRD", "write product requirements document", or mentions "产品需求文档", "PRD模板". Automatically generates comprehensive Chinese PRD documents following 2026 best practices.

Create Jira Feature

from diegosouzapw/awesome-omni-skill

Implementation guide for creating Jira features representing strategic objectives and market problems

create-feature

from diegosouzapw/awesome-omni-skill

Creates Features following the T-Minus-15 process template. Features represent significant deliverables that contain multiple User Stories. Includes proper metadata, MoSCoW prioritization, effort estimates, deliverables, and benefit hypothesis.

create-feature-branch

from diegosouzapw/awesome-omni-skill

Create properly named feature branch from development with remote tracking, following WescoBar naming conventions and git best practices