splitting-datasets
Process split datasets into training, validation, and testing sets for ML model development. Use when requesting "split dataset", "train-test split", or "data partitioning". Trigger with relevant phrases based on skill purpose.
Best use case
splitting-datasets is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Process split datasets into training, validation, and testing sets for ML model development. Use when requesting "split dataset", "train-test split", or "data partitioning". Trigger with relevant phrases based on skill purpose.
Teams using splitting-datasets should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/splitting-datasets/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How splitting-datasets Compares
| Feature / Agent | splitting-datasets | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Process split datasets into training, validation, and testing sets for ML model development. Use when requesting "split dataset", "train-test split", or "data partitioning". Trigger with relevant phrases based on skill purpose.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
# Dataset Splitter This skill provides automated assistance for dataset splitter tasks. ## Overview This skill automates the process of dividing a dataset into subsets for training, validating, and testing machine learning models. It ensures proper data preparation and facilitates robust model evaluation. ## How It Works 1. **Analyze Request**: The skill analyzes the user's request to determine the dataset to be split and the desired proportions for each subset. 2. **Generate Code**: Based on the request, the skill generates Python code utilizing standard ML libraries to perform the data splitting. 3. **Execute Splitting**: The code is executed to split the dataset into training, validation, and testing sets according to the specified ratios. ## When to Use This Skill This skill activates when you need to: - Prepare a dataset for machine learning model training. - Create training, validation, and testing sets. - Partition data to evaluate model performance. ## Examples ### Example 1: Splitting a CSV file User request: "Split the data in 'my_data.csv' into 70% training, 15% validation, and 15% testing sets." The skill will: 1. Generate Python code to read the 'my_data.csv' file. 2. Execute the code to split the data according to the specified proportions, creating 'train.csv', 'validation.csv', and 'test.csv' files. ### Example 2: Creating a Train-Test Split User request: "Create a train-test split of 'large_dataset.csv' with an 80/20 ratio." The skill will: 1. Generate Python code to load 'large_dataset.csv'. 2. Execute the code to split the dataset into 80% training and 20% testing sets, saving them as 'train.csv' and 'test.csv'. ## Best Practices - **Data Integrity**: Verify that the splitting process maintains the integrity of the data, ensuring no data loss or corruption. - **Stratification**: Consider stratification when splitting imbalanced datasets to maintain class distributions in each subset. - **Randomization**: Ensure the splitting process is randomized to avoid bias in the resulting datasets. ## Integration This skill can be integrated with other data processing and model training tools within the Claude Code ecosystem to create a complete machine learning workflow. ## Prerequisites - Appropriate file access permissions - Required dependencies installed ## Instructions 1. Invoke this skill when the trigger conditions are met 2. Provide necessary context and parameters 3. Review the generated output 4. Apply modifications as needed ## Output The skill produces structured output relevant to the task. ## Error Handling - Invalid input: Prompts for correction - Missing dependencies: Lists required components - Permission errors: Suggests remediation steps ## Resources - Project documentation - Related skills and commands
Related Skills
zinc-database
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
zarr-python
Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.
yeet
Use only when the user explicitly asks to stage, commit, push, and open a GitHub pull request in one flow using the GitHub CLI (`gh`).
xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
xan
High-performance CSV processing with xan CLI for large tabular datasets, streaming transformations, and low-memory pipelines.
writing-plans
Use when you have a spec or requirements for a multi-step task, before touching code
writing-docs
Guides for writing and editing Remotion documentation. Use when adding docs pages, editing MDX files in packages/docs, or writing documentation content.
windows-hook-debugging
Windows环境下Claude Code插件Hook执行错误的诊断与修复。当遇到hook error、cannot execute binary file、.sh regex误匹配、WSL/Git Bash冲突时使用。
weights-and-biases
Track ML experiments with automatic logging, visualize training in real-time, optimize hyperparameters with sweeps, and manage model registry with W&B - collaborative MLOps platform
webthinker-deep-research
Deep web research for VCO: multi-hop search+browse+extract with an auditable action trace and a structured report (WebThinker-style).
vscode-release-notes-writer
Guidelines for writing and reviewing Insiders and Stable release notes for Visual Studio Code.
visualization-best-practices
Visualization Best Practices - Auto-activating skill for Data Analytics. Triggers on: visualization best practices, visualization best practices Part of the Data Analytics skill category.