ml-experiment

Design and run machine learning experiments with proper evaluation using jupyter_execute, including training, benchmarking, and ablation studies

42 stars

Best use case

ml-experiment is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Design and run machine learning experiments with proper evaluation using jupyter_execute, including training, benchmarking, and ablation studies

Teams using ml-experiment should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/prismer-ml-experiment/SKILL.md --create-dirs "https://raw.githubusercontent.com/Zaoqu-Liu/ScienceClaw/main/skills/prismer-ml-experiment/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/prismer-ml-experiment/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How ml-experiment Compares

Feature / Agentml-experimentStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Design and run machine learning experiments with proper evaluation using jupyter_execute, including training, benchmarking, and ablation studies

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# ML Experiment Skill

## Description
Design, implement, and evaluate machine learning experiments with reproducible workflows, proper baselines, and statistical analysis.

## Tools Used
- `jupyter_execute` - Execute ML code in Python (auto-switches to Jupyter)
- `jupyter_notebook` - Manage experiment notebooks
- `update_notebook` - Set up experiment cells
- `update_latex` - Write experiment results to papers
- `latex_compile` - Compile CS conference papers (auto-switches to LaTeX)
- `arxiv_to_prompt` - Read related work from arXiv papers
- `update_notes` - Write experiment logs and analysis summaries

## Capabilities

### Experiment Design
- Proper train/validation/test splits
- Cross-validation and bootstrap confidence intervals
- Ablation study design
- Hyperparameter search (grid, random, Bayesian)

### Implementation
- PyTorch and TensorFlow model building
- Data loading and augmentation pipelines
- Training loops with logging and checkpointing
- Distributed training setup

### Evaluation
- Standard metrics per task (accuracy, F1, BLEU, mAP, etc.)
- Statistical significance testing (paired t-test, bootstrap)
- Comparison with baselines
- Error analysis and visualization

## Usage Patterns

### Run an Experiment
When user says: "Train a model for [task]"
1. Clarify dataset, metrics, and baselines
2. Implement data loading and preprocessing
3. Build model architecture
4. Train with proper logging
5. Evaluate and compare to baselines
6. Report results with confidence intervals

### Reproduce a Paper
When user says: "Reproduce [paper title/arXiv ID]"
1. Fetch paper using arxiv_to_prompt
2. Extract key method details
3. Implement core algorithm
4. Run experiments matching paper setup
5. Compare results to reported numbers

Related Skills

zinc-database

42
from Zaoqu-Liu/ScienceClaw

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

42
from Zaoqu-Liu/ScienceClaw

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

Academic Writing

42
from Zaoqu-Liu/ScienceClaw

## Overview

scientific-visualization

42
from Zaoqu-Liu/ScienceClaw

## Overview

venue-templates

42
from Zaoqu-Liu/ScienceClaw

Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.

vaex

42
from Zaoqu-Liu/ScienceClaw

Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.

uspto-database

42
from Zaoqu-Liu/ScienceClaw

Access USPTO APIs for patent/trademark searches, examination history (PEDS), assignments, citations, office actions, TSDR, for IP analysis and prior art searches.

uniprot-database

42
from Zaoqu-Liu/ScienceClaw

Direct REST API access to UniProt. Protein searches, FASTA retrieval, ID mapping, Swiss-Prot/TrEMBL. For Python workflows with multiple databases, prefer bioservices (unified interface to 40+ services). Use this for direct HTTP/REST work or UniProt-specific control.

umap-learn

42
from Zaoqu-Liu/ScienceClaw

UMAP dimensionality reduction. Fast nonlinear manifold learning for 2D/3D visualization, clustering preprocessing (HDBSCAN), supervised/parametric UMAP, for high-dimensional data.

treatment-plans

42
from Zaoqu-Liu/ScienceClaw

Generate concise (3-4 page), focused medical treatment plans in LaTeX/PDF format for all clinical specialties. Supports general medical treatment, rehabilitation therapy, mental health care, chronic disease management, perioperative care, and pain management. Includes SMART goal frameworks, evidence-based interventions with minimal text citations, regulatory compliance (HIPAA), and professional formatting. Prioritizes brevity and clinical actionability.

transformers

42
from Zaoqu-Liu/ScienceClaw

This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.

torchdrug

42
from Zaoqu-Liu/ScienceClaw

PyTorch-native graph neural networks for molecules and proteins. Use when building custom GNN architectures for drug discovery, protein modeling, or knowledge graph reasoning. Best for custom model development, protein property prediction, retrosynthesis. For pre-trained models and diverse featurizers use deepchem; for benchmark datasets use pytdc.