upload-deployment
Complete reference for model upload and deployment. Covers HuggingFace upload, save strategies (LoRA, merged 16-bit, merged 4-bit), GGUF conversion, model merging, model cards, and the full upload workflow. Use when uploading models, creating GGUF files, merging LoRA adapters, or deploying to HuggingFace. This skill is about USING the upload/deployment tools via CLI — never modifying source code.
Best use case
upload-deployment is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Complete reference for model upload and deployment. Covers HuggingFace upload, save strategies (LoRA, merged 16-bit, merged 4-bit), GGUF conversion, model merging, model cards, and the full upload workflow. Use when uploading models, creating GGUF files, merging LoRA adapters, or deploying to HuggingFace. This skill is about USING the upload/deployment tools via CLI — never modifying source code.
Teams using upload-deployment should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/upload-deployment/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How upload-deployment Compares
| Feature / Agent | upload-deployment | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Complete reference for model upload and deployment. Covers HuggingFace upload, save strategies (LoRA, merged 16-bit, merged 4-bit), GGUF conversion, model merging, model cards, and the full upload workflow. Use when uploading models, creating GGUF files, merging LoRA adapters, or deploying to HuggingFace. This skill is about USING the upload/deployment tools via CLI — never modifying source code.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Upload & Deployment Upload trained models to HuggingFace with optional GGUF conversion and model card generation. For cloud training, provider-native storage remains the source of truth. Hugging Face Hub publishing is optional and only applies to `final_model`. ## Quick Reference | Task | Command | |------|---------| | Interactive menu | `./run.sh` → Upload | | Upload merged 16-bit | `python3 scripts/upload_model.py MODEL_PATH user/repo --save-method merged_16bit` | | Upload with GGUF | `python3 scripts/upload_model.py MODEL_PATH user/repo --save-method merged_16bit --create-gguf` | | Upload LoRA only | `python3 scripts/upload_model.py MODEL_PATH user/repo --save-method lora` | | Merge LoRA manually | `./run.sh` → Merge LoRA | | Convert to GGUF only | `./run.sh` → Convert | | Cloud GGUF conversion | `python tuner.py cloud-run --job-config Trainers/recipes/gguf_conversion.yaml --yes` | | Full pipeline | `./run.sh` → Full Pipeline (Train → Upload → Eval) | ## Save Strategies | Strategy | Size (7B) | GPU Required | Best For | |----------|-----------|--------------|----------| | `lora_only` | ~100-500 MB | No | Sharing adapters, fast upload | | `merged_16bit` | ~14 GB | Yes | Production inference, GGUF source | | `merged_4bit` | ~4 GB | Yes | Smaller footprint, slight quality loss | ## GGUF Quantizations | Format | Size (7B) | Quality | Use Case | |--------|-----------|---------|----------| | Q8_0 | ~7 GB | Highest | Best quality, more RAM | | Q5_K_M | ~5 GB | High | Good balance | | Q4_K_M | ~4 GB | Good | Most popular, efficient | ## Key Directories - `scripts/upload_model.py` — Generic upload entry point - `scripts/cloud_gguf_convert.py` — Cloud GGUF conversion CLI (download → convert → upload) - `Trainers/recipes/gguf_conversion.yaml` — HF Jobs recipe (`target: cloud`) for cloud GGUF conversion - `shared/upload/` — Upload orchestrator and strategies - `shared/upload/converters/` — GGUF and WebGPU converters - `shared/model_loading/` — Model loading and LoRA merge utilities ## Progressive Reference Load the specific reference you need: | Reference | When to Load | Path | |-----------|-------------|------| | **Upload Workflow** | Uploading to HuggingFace, full process | `reference/upload-workflow.md` | | **GGUF Conversion** | Creating GGUF files, quantization options | `reference/gguf-conversion.md` | | **Model Merging** | Merging LoRA into base, preparing for GRPO | `reference/model-merging.md` | | **Local Mac GGUF Workflow** | Pull from HF bucket, merge locally on macOS, create GGUF, and place into LM Studio/Ollama | `reference/local-mac-bucket-to-gguf.md` | | **Model Cards** | Documentation, lineage, manifests | `reference/model-cards.md` | | **Cloud Training** | Provider-native storage, optional final-model publish, artifact discovery | `../fine-tuning/reference/cloud-training.md` | ## Common Patterns **Standard upload after SFT:** ```bash python3 scripts/upload_model.py \ Trainers/sft/sft_output/TIMESTAMP/final_model \ username/model-name \ --save-method merged_16bit \ --create-gguf ``` **Merge LoRA for GRPO continuation:** ```bash # Use shared merge utility ./run.sh → Merge LoRA # Or the GRPO trainer auto-merges when lora_path is set in config ``` **Cloud GGUF conversion (when local RAM is insufficient):** ```bash # 1. Upload merged model to HF first (if not already there) # 2. Edit env vars in the job YAML or override at runtime: # GGUF_MODEL_REPO: the HF repo with the merged model # GGUF_QUANT_TYPE: q8_0, q5_k_m, or q4_k_m python tuner.py cloud-run --job-config Trainers/recipes/gguf_conversion.yaml --yes # 3. GGUF is uploaded back to the same HF repo under gguf/ ``` **Cloud GGUF conversion (direct script, outside cloud-run):** ```bash python scripts/cloud_gguf_convert.py \ --model-repo user/model-name \ --quant q8_0 \ --upload-to user/model-name ``` **Upload with evaluation results:** ```bash # Evaluate first python -m Evaluator.cli --backend unsloth --model path/to/model \ --lineage eval_lineage.json --upload-to-hf user/model --update-model-card ``` ## Output Structure After upload, HuggingFace repo contains: ``` username/model-name/ ├── lora/ # LoRA adapters (if lora_only) ├── merged-16bit/ # Full model (if merged_16bit) ├── gguf/ # GGUF quantizations (if --create-gguf) │ ├── model-Q4_K_M.gguf │ ├── model-Q5_K_M.gguf │ ├── model-Q8_0.gguf │ └── model-mmproj.gguf # Vision projector (VL models only) ├── upload_manifest.json # Upload metadata ├── training_lineage.json # Training provenance └── README.md # Auto-generated model card ``` Cloud artifact policy: - Default: artifacts stay in provider-native storage - `hf_jobs`: Hugging Face Bucket - `modal`: Modal Volume - `runpod`: RunPod Network Volume - Optional publish: only `final_model` is pushed to the target HF repo when enabled ## Environment Variables ```bash HF_TOKEN=hf_... # Required for uploads ``` ## Tips - Always use `merged_16bit` as the source for GGUF conversion (best quality) - The reliable GGUF converter merges LoRA once, then creates all quants (~10 min saved) - Vision-language models auto-get an `mmproj.gguf` for the vision projector - On macOS, bucket-backed cloud adapters are often easiest to handle one model at a time: pull the `final_model`, merge locally, create the quant you actually need first, then clean temp files before moving to the next model - If the local machine lacks `unsloth`, a plain `transformers` + `peft` merge venv is an acceptable fallback for text models before llama.cpp conversion - For merged local models, call the lower-level llama.cpp conversion path directly; the current reliable converter's top-level `convert()` flow assumes it starts from a LoRA adapter - LM Studio on this repo owner's Mac uses `~/.lmstudio/models/<publisher>/<model-folder>/`; placing the `.gguf` there plus an optional `config.json` is enough for local testing after refresh/restart - Qwen 3.5 adapters may need a `ConditionalGeneration` merge path instead of `AutoModelForCausalLM`; if the adapter keys live under `language_model.*`, inspect the base architecture before merging - If `llama.cpp` says a merged model architecture is unsupported, update the local `Trainers/llama.cpp` checkout before retrying conversion; newer model families are often converter-gated rather than merge-gated - On WSL, temp files use native filesystem to avoid NTFS performance issues - `training_lineage.json` is auto-generated — includes model, LoRA, dataset, hardware info - Use `upload_manifest.json` to verify what was uploaded - The upload orchestrator handles everything — prefer `./run.sh` → Upload over manual commands - Cloud jobs never rely on the remote container filesystem as the only copy; inspect provider-native storage first, then publish `final_model` if needed - If local GGUF conversion OOMs (common on machines with <32GB RAM), use the cloud GGUF job (`cpu-upgrade` flavor, 32GB RAM, no GPU needed) - The cloud GGUF script uses pure Python conversion (llama.cpp `convert_hf_to_gguf.py`) — no compilation required - Some models (e.g., Gemma 4) may need tokenizer config patching before conversion — the cloud script handles known quirks automatically
Related Skills
synthetic-data-generation
Complete reference for the SynthChat synthetic dataset generation system. Covers CLI commands (generate, improve, validate), scenario YAML authoring, rubric YAML authoring, settings configuration, evaluation, and full workflow. Use when generating datasets, writing rubrics/scenarios, configuring models/workers, improving dataset quality, or running evaluations. This skill is about USING the system via CLI and YAML — never modifying source code.
research-reporting
Create structured research notes from experiment runs and analysis artifacts. Use when creating a note at run launch, updating it as training/evaluation/loss stages finish, summarizing a finished run, comparing experiment outcomes, extracting hypotheses from eval/loss artifacts, or proposing next-run actions grounded in `.tracking/experiments/<id>/analysis/` outputs. This skill is about turning repo-native experiment evidence into stable, machine-readable markdown.
fine-tuning
Complete reference for the fine-tuning pipeline (SFT, KTO, GRPO), cloud HF Jobs workflows, autonomous experiment search, checkpoint evaluation, and LoRA surgery. Covers training CLI flags, YAML configuration, model presets, dataset requirements, LoRA settings, training monitoring, hyperparameter search, and post-training optimization. Use when training models, configuring training runs, choosing hyperparameters, running cloud experiments, inspecting HF jobs, or troubleshooting training issues. This skill is about USING the training system via CLI and YAML — never modifying source code.
evaluation
Complete reference for the config-first model evaluation system. Covers the Evaluator CLI, assertion-driven YAML scenarios, response views, backend configuration, presets, scoring, LLM-as-judge, model comparison, and HuggingFace integration. Use when evaluating models, writing test prompts, comparing training runs, or interpreting eval results. This skill is about USING the evaluation system via CLI and YAML.
dataset-publishing
Publish local dataset artifacts to a Hugging Face dataset repo. Use when uploading a JSONL dataset, pushing a filtered dataset variant, syncing a matching .metadata.json sidecar, or renaming a dataset file in the target repo. This skill is about USING the checked-in dataset publish script via CLI — never ad hoc Python.
case-studies
End-to-end case studies showing how to implement the full training pipeline for different skill types. Covers three complete worked examples — tool-calling training, essay-style training, and agentic search (RAG agent) training — demonstrating dataset design, synthetic generation, validation, fine-tuning, evaluation, and iteration. Use when onboarding to the project, understanding how all components fit together, explaining the pipeline to others, or planning a new training capability. This skill is about UNDERSTANDING the system holistically — reference the other skills for specific CLI commands.
deployment-patterns
Deployment workflows, CI/CD pipeline patterns, Docker containerization, health checks, rollback strategies, and production readiness checklists for web applications. Use when setting up deployment infrastructure or planning releases.
makepad-deployment
CRITICAL: Use for Makepad packaging and deployment. Triggers on: deploy, package, APK, IPA, 打包, 部署, cargo-packager, cargo-makepad, WASM, Android, iOS, distribution, installer, .deb, .dmg, .nsis, GitHub Actions, CI, action, marketplace
file-uploads
Expert at handling file uploads and cloud storage. Covers S3, Cloudflare R2, presigned URLs, multipart uploads, and image optimization. Knows how to handle large files without blocking.
expo-deployment
Deploy Expo apps to production
deployment-validation-config-validate
You are a configuration management expert specializing in validating, testing, and ensuring the correctness of application configurations. Create comprehensive validation schemas, implement configurat
deployment-procedures
Production deployment principles and decision-making. Safe deployment workflows, rollback strategies, and verification. Teaches thinking, not scripts.