pytorch-trainer
PyTorch model training skill with custom training loops, gradient management, and GPU optimization.
Best use case
pytorch-trainer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
PyTorch model training skill with custom training loops, gradient management, and GPU optimization.
Teams using pytorch-trainer should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/pytorch-trainer/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How pytorch-trainer Compares
| Feature / Agent | pytorch-trainer | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
PyTorch model training skill with custom training loops, gradient management, and GPU optimization.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# pytorch-trainer
## Overview
PyTorch model training skill with custom training loops, gradient management, GPU optimization, and integration with experiment tracking systems.
## Capabilities
- Custom training loop execution
- Learning rate scheduling (StepLR, CosineAnnealing, OneCycleLR, etc.)
- Gradient clipping and accumulation
- Mixed precision training (AMP)
- Checkpoint management and resumption
- DataLoader optimization
- Multi-GPU training (DataParallel, DistributedDataParallel)
- Early stopping with patience
## Target Processes
- Model Training Pipeline with Experiment Tracking
- Distributed Training Orchestration
- AutoML Pipeline Orchestration
## Tools and Libraries
- PyTorch
- PyTorch Lightning (optional)
- torchvision, torchaudio, torchtext
- CUDA toolkit
## Input Schema
```json
{
"type": "object",
"required": ["modelPath", "dataConfig", "trainingConfig"],
"properties": {
"modelPath": {
"type": "string",
"description": "Path to model definition file"
},
"dataConfig": {
"type": "object",
"properties": {
"trainPath": { "type": "string" },
"valPath": { "type": "string" },
"batchSize": { "type": "integer" },
"numWorkers": { "type": "integer" }
}
},
"trainingConfig": {
"type": "object",
"properties": {
"epochs": { "type": "integer" },
"learningRate": { "type": "number" },
"optimizer": { "type": "string" },
"scheduler": { "type": "string" },
"mixedPrecision": { "type": "boolean" },
"gradientClipping": { "type": "number" },
"gradientAccumulation": { "type": "integer" }
}
},
"checkpointConfig": {
"type": "object",
"properties": {
"saveDir": { "type": "string" },
"saveEvery": { "type": "integer" },
"resumeFrom": { "type": "string" }
}
}
}
}
```
## Output Schema
```json
{
"type": "object",
"required": ["status", "metrics", "checkpointPath"],
"properties": {
"status": {
"type": "string",
"enum": ["success", "error", "early_stopped"]
},
"metrics": {
"type": "object",
"properties": {
"trainLoss": { "type": "number" },
"valLoss": { "type": "number" },
"trainAccuracy": { "type": "number" },
"valAccuracy": { "type": "number" },
"epochsTrained": { "type": "integer" },
"trainingTime": { "type": "number" }
}
},
"checkpointPath": {
"type": "string"
},
"learningCurve": {
"type": "array",
"items": {
"type": "object",
"properties": {
"epoch": { "type": "integer" },
"trainLoss": { "type": "number" },
"valLoss": { "type": "number" }
}
}
}
}
}
```
## Usage Example
```javascript
{
kind: 'skill',
title: 'Train PyTorch model',
skill: {
name: 'pytorch-trainer',
context: {
modelPath: 'models/resnet.py',
dataConfig: {
trainPath: 'data/train',
valPath: 'data/val',
batchSize: 32,
numWorkers: 4
},
trainingConfig: {
epochs: 100,
learningRate: 0.001,
optimizer: 'AdamW',
scheduler: 'cosine',
mixedPrecision: true,
gradientClipping: 1.0
}
}
}
}
```Related Skills
vqc-trainer
Variational quantum classifier training skill with gradient optimization
calibration-trainer
Probability calibration training skill for improving forecast accuracy and reducing overconfidence
tensorflow-trainer
TensorFlow/Keras model training skill with callbacks, distributed strategies, and TensorBoard integration.
sklearn-model-trainer
Scikit-learn model training skill with cross-validation, hyperparameter tuning, pipeline construction, and model serialization. Enables automated ML model development using scikit-learn's comprehensive toolkit.
ray-distributed-trainer
Distributed computing skill using Ray for parallel training, hyperparameter search, and resource management.
process-builder
Scaffold new babysitter process definitions following SDK patterns, proper structure, and best practices. Guides the 3-phase workflow from research to implementation.
babysitter
Orchestrate via @babysitter. Use this skill when asked to babysit a run, orchestrate a process or whenever it is called explicitly. (babysit, babysitter, orchestrate, orchestrate a run, workflow, etc.)
yolo
Run Babysitter autonomously with minimal manual interruption.
user-install
Install the user-level Babysitter Codex setup.
team-install
Install the team-pinned Babysitter Codex workspace setup.
retrospect
Summarize or retrospect on a completed Babysitter run.
resume
Resume an existing Babysitter run from Codex.