ml-experiment

Design and run machine learning experiments with proper evaluation using jupyter_execute, including training, benchmarking, and ablation studies

42 stars

byZaoqu-Liu

View on GitHub Installation ↓

Best use case

ml-experiment is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Design and run machine learning experiments with proper evaluation using jupyter_execute, including training, benchmarking, and ablation studies

Teams using ml-experiment should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/prismer-ml-experiment/SKILL.md --create-dirs "https://raw.githubusercontent.com/Zaoqu-Liu/ScienceClaw/main/skills/prismer-ml-experiment/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/prismer-ml-experiment/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ml-experiment Compares

Feature / Agent	ml-experiment	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Design and run machine learning experiments with proper evaluation using jupyter_execute, including training, benchmarking, and ablation studies

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# ML Experiment Skill

## Description
Design, implement, and evaluate machine learning experiments with reproducible workflows, proper baselines, and statistical analysis.

## Tools Used
- `jupyter_execute` - Execute ML code in Python (auto-switches to Jupyter)
- `jupyter_notebook` - Manage experiment notebooks
- `update_notebook` - Set up experiment cells
- `update_latex` - Write experiment results to papers
- `latex_compile` - Compile CS conference papers (auto-switches to LaTeX)
- `arxiv_to_prompt` - Read related work from arXiv papers
- `update_notes` - Write experiment logs and analysis summaries

## Capabilities

### Experiment Design
- Proper train/validation/test splits
- Cross-validation and bootstrap confidence intervals
- Ablation study design
- Hyperparameter search (grid, random, Bayesian)

### Implementation
- PyTorch and TensorFlow model building
- Data loading and augmentation pipelines
- Training loops with logging and checkpointing
- Distributed training setup

### Evaluation
- Standard metrics per task (accuracy, F1, BLEU, mAP, etc.)
- Statistical significance testing (paired t-test, bootstrap)
- Comparison with baselines
- Error analysis and visualization

## Usage Patterns

### Run an Experiment
When user says: "Train a model for [task]"
1. Clarify dataset, metrics, and baselines
2. Implement data loading and preprocessing
3. Build model architecture
4. Train with proper logging
5. Evaluate and compare to baselines
6. Report results with confidence intervals

### Reproduce a Paper
When user says: "Reproduce [paper title/arXiv ID]"
1. Fetch paper using arxiv_to_prompt
2. Extract key method details
3. Implement core algorithm
4. Run experiments matching paper setup
5. Compare results to reported numbers

Related Skills

zinc-database

from Zaoqu-Liu/ScienceClaw

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

from Zaoqu-Liu/ScienceClaw

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

Academic Writing

from Zaoqu-Liu/ScienceClaw

## Overview

scientific-visualization

from Zaoqu-Liu/ScienceClaw

## Overview

venue-templates

from Zaoqu-Liu/ScienceClaw

Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.

vaex

from Zaoqu-Liu/ScienceClaw

Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.

uspto-database

from Zaoqu-Liu/ScienceClaw

Access USPTO APIs for patent/trademark searches, examination history (PEDS), assignments, citations, office actions, TSDR, for IP analysis and prior art searches.

uniprot-database

from Zaoqu-Liu/ScienceClaw

Direct REST API access to UniProt. Protein searches, FASTA retrieval, ID mapping, Swiss-Prot/TrEMBL. For Python workflows with multiple databases, prefer bioservices (unified interface to 40+ services). Use this for direct HTTP/REST work or UniProt-specific control.

umap-learn

from Zaoqu-Liu/ScienceClaw

UMAP dimensionality reduction. Fast nonlinear manifold learning for 2D/3D visualization, clustering preprocessing (HDBSCAN), supervised/parametric UMAP, for high-dimensional data.

treatment-plans

from Zaoqu-Liu/ScienceClaw

Generate concise (3-4 page), focused medical treatment plans in LaTeX/PDF format for all clinical specialties. Supports general medical treatment, rehabilitation therapy, mental health care, chronic disease management, perioperative care, and pain management. Includes SMART goal frameworks, evidence-based interventions with minimal text citations, regulatory compliance (HIPAA), and professional formatting. Prioritizes brevity and clinical actionability.

transformers

from Zaoqu-Liu/ScienceClaw

This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.

torchdrug

from Zaoqu-Liu/ScienceClaw

PyTorch-native graph neural networks for molecules and proteins. Use when building custom GNN architectures for drug discovery, protein modeling, or knowledge graph reasoning. Best for custom model development, protein property prediction, retrosynthesis. For pre-trained models and diverse featurizers use deepchem; for benchmark datasets use pytdc.