veomni-new-model

Use this skill when adding support for a new model to VeOmni. Covers the full lifecycle: analyzing the HuggingFace model, creating model patches, defining parallel plans, writing configs, integrating with the trainer, and testing. Trigger: 'add model', 'support new model', 'integrate <model_name>', 'new model support'.

1,799 stars

byByteDance-Seed

View on GitHub Installation ↓

Best use case

veomni-new-model is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using veomni-new-model should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/veomni-new-model/SKILL.md --create-dirs "https://raw.githubusercontent.com/ByteDance-Seed/VeOmni/main/.agents/skills/veomni-new-model/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/veomni-new-model/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How veomni-new-model Compares

Feature / Agent	veomni-new-model	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Before You Start: Create Todos

Use TodoWrite to track all phases:

```
Phase 1: Analyze HF model             -> in_progress
Phase 2: Create model patch            -> pending
Phase 3: Define parallel plan          -> pending
Phase 4: Write training config         -> pending
Phase 5: Integrate with trainer        -> pending
Phase 6: Test                          -> pending
```

## Phase 1: Analyze HuggingFace Model

1. **Identify the model** on HuggingFace. Read its `config.json`, `modeling_*.py`, and any processor configs.

2. **Determine model category**:
   - Text-only LLM -> `veomni/models/transformers/<model_name>/`
   - Vision-Language -> `veomni/models/transformers/<model_name>/` + `veomni/data/multimodal/`
   - MoE model -> additional `veomni/distributed/moe/` integration
   - Diffusion model -> `veomni/models/diffusers/<model_name>/`
   - Omni model -> `veomni/models/seed_omni/`

3. **Check existing similar models**: Find the closest existing model in `veomni/models/transformers/` and use it as a reference. E.g., if adding a new Qwen variant, reference `qwen3/` or `qwen3_vl/`.

4. **Identify required patches**: VeOmni uses a patchgen system (`veomni/patchgen/`) to auto-generate model patches from HuggingFace models. Check if a patch spec already exists or if one needs to be created.

## Phase 2: Create Model Patch

1. **Create the model directory**: `veomni/models/transformers/<model_name>/`

2. **Required files**:
   - `__init__.py` — model registration
   - `modeling_<model_name>_patch.py` — monkey-patches for HF model (flash attention, sequence parallel, etc.)
   - `parallel_plan.py` — FSDP/FSDP2 sharding plan
   - `generated/` — auto-generated files from patchgen (do NOT edit manually)

3. **Patch patterns** — follow existing models:
   - Replace attention with flash attention (via `veomni/ops/flash_attn/`)
   - Add sequence parallel support (via `veomni/distributed/sequence_parallel/`)
   - Register model in `veomni/models/auto.py`

4. **Run patchgen** if applicable: `make patchgen`

## Phase 3: Define Parallel Plan

1. Create `parallel_plan.py` in the model directory.

2. Define FSDP/FSDP2 sharding strategy:
   - Which layers to wrap (typically transformer blocks)
   - Activation checkpointing granularity
   - Parameter dtype policies

3. If the model is MoE, define expert parallelism plan in addition to FSDP.

4. Reference existing parallel plans for guidance (e.g., `veomni/models/transformers/qwen3/parallel_plan.py`).

## Phase 4: Write Training Config

1. **Model config**: Create `configs/model_configs/<model_family>/<ModelName>.json` matching HuggingFace format.

2. **Training config**: Create YAML in the appropriate directory:
   - Text: `configs/text/<model_name>.yaml`
   - Multimodal: `configs/multimodal/<model_name>/<model_name>.yaml`
   - DiT: `configs/dit/<model_name>.yaml`

3. Config must include: model path, data config, optimizer settings, parallelism config, checkpoint settings.

4. **Verify against existing configs** — match the structure of similar model configs.

## Phase 5: Integrate with Trainer

1. Verify the model works with the appropriate trainer:
   - Text -> `TextTrainer` (`veomni/trainer/text_trainer.py`)
   - VLM -> `VLMTrainer` (`veomni/trainer/vlm_trainer.py`)
   - DiT -> `DitTrainer` (`veomni/trainer/dit_trainer.py`)

2. If the model needs custom data preprocessing:
   - Add transform in `veomni/data/data_transform.py` or `veomni/data/multimodal/`
   - Register the transform for the model

3. If the model needs custom collator logic:
   - Extend `veomni/data/data_collator.py`

## Phase 6: Test

1. **Create toy config**: Add `tests/toy_config/<model_name>_toy/config.json` with minimal parameters for fast testing.

2. **Unit tests**: Add tests in `tests/models/` to verify:
   - Model loads correctly via `veomni.models.auto`
   - Forward pass produces correct output shape
   - Model patch applies without errors

3. **E2e tests** (if feasible): Test a short training run using the toy config.

4. Run `make quality` and `pytest tests/models/`.

5. **Update documentation**:
   - Add usage example to `docs/` (training command, config reference).
   - Update `.agents/knowledge/architecture.md` if the model adds a new module or trainer path.
   - Update supported models table in project `README.md` if applicable.

## Common Pitfalls

- **Model registry**: Registration must happen at import time in `__init__.py`. If the model's `AutoConfig` type is not registered, `build_foundation_model()` will fail.
- **Generated files**: Never edit files in `generated/` directories — they are overwritten by patchgen. Edit the patch spec or the `modeling_*_patch.py` instead.
- **Tokenizer compatibility**: Some models require specific tokenizer versions or custom chat templates — verify in `veomni/data/chat_template.py`.
- **Transformers version**: New code must target v5. Use `is_transformers_version_greater_or_equal_to()` for v4 compat guards. v4-era APIs (e.g., `AutoModelForVision2Seq`) no longer exist in v5 — use v5 equivalents.

Related Skills

veomni-uv-update

1799

from ByteDance-Seed/VeOmni

Use this skill when updating dependencies managed by uv: bumping a package version, upgrading the uv tool itself, updating torch/CUDA stack, switching transformers version, or regenerating the lockfile. Trigger: 'update dependency', 'bump version', 'upgrade uv', 'update torch', 'update lockfile', 'uv sync fails'.

veomni-review

1799

from ByteDance-Seed/VeOmni

Use this skill before committing ANY code change — this is a mandatory gate in the commit flow. Also trigger proactively when: you've made changes across multiple files and want to check consistency, you're unsure if a fix is safe, a change touches shared infrastructure (BaseTrainer, distributed, model loading, data pipeline), or a change is larger than a few lines. The review launches a subagent that checks implementation quality, multi-file consistency, and known constraint violations, then rates the change as safe/needs-attention/risky.

veomni-new-op

1799

from ByteDance-Seed/VeOmni

Use this skill when adding a new optimized kernel or operator to veomni/ops/. Covers the full lifecycle: understanding VeOmni's ops architecture (monkey-patch + global function pointer pattern), implementing the kernel, registering it, adding tests, and documenting it. Trigger: 'add op', 'new kernel', 'add attention variant', 'new fused op', 'add triton kernel', 'optimize operator'.

veomni-develop

1799

from ByteDance-Seed/VeOmni

VeOmni-specific checklist for feature development and refactoring. Covers impact analysis across modalities, trainer hierarchy, data pipeline, and distributed code. Use before implementing any non-trivial change. For model-specific or ops-specific work, use veomni-new-model or veomni-new-op instead. Trigger: 'add feature', 'implement', 'refactor', 'reorganize', 'new capability'.

veomni-debug

1799

from ByteDance-Seed/VeOmni

Use this skill for ANY bug, error, crash, wrong output, loss divergence, gradient explosion, test failure, CUDA error, distributed training hang, checkpoint load failure, or unexpected behavior. Covers both quick fixes (clear root cause) and complex debugging (unclear cause). Trigger: 'fix bug', 'fix error', 'broken', 'crash', 'doesn't work', 'fails with', 'loss NaN', 'training hangs', 'FSDP error', 'OOM'.

create-pr

1799

from ByteDance-Seed/VeOmni

Create a pull request for the current branch. Handles uncommitted changes, generates a PR title matching the `[{modules}] {type}: {description}` format enforced by CI, and fills in the PR description template. Trigger: 'create pr', 'open pr', 'submit pr', 'make pr'.

foundation-models-on-device

144923

from affaan-m/everything-claude-code

苹果FoundationModels框架用于设备上的LLM——文本生成、使用@Generable进行引导生成、工具调用，以及在iOS 26+中的快照流。

DevelopmentClaude

hugging-face-model-trainer

31392

from sickn33/antigravity-awesome-skills

Train or fine-tune TRL language models on Hugging Face Jobs, including SFT, DPO, GRPO, and GGUF export.

AI Development & Self-ImprovementClaude

MCP Engineering — Complete Model Context Protocol System

3891

from openclaw/skills

Build, integrate, secure, and scale MCP servers and clients. From first server to production multi-tool architecture.

AI Infrastructure & Integrations

ml-model-eval-benchmark

3891

from openclaw/skills

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

Machine Learning

RLM (Recursive Language Model) Skill

from richardwhiteii/rlm

The RLM (Recursive Language Model) Skill enables AI agents to process extremely large contexts (10M+ tokens) by recursively chunking, processing, and aggregating results, effectively overcoming context window limitations.

Data & ResearchClaude

threat-modeling-expert

31392

from sickn33/antigravity-awesome-skills

Expert in threat modeling methodologies, security architecture review, and risk assessment. Masters STRIDE, PASTA, attack trees, and security requirement extraction. Use PROACTIVELY for security architecture reviews, threat identification, or building secure-by-design systems.