experimental-data-analysis
Statistical analysis and reporting for experimental datasets; use when you need to interpret experimental results, test significance (t-tests/ANOVA), or generate reproducible reports.
Best use case
experimental-data-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Statistical analysis and reporting for experimental datasets; use when you need to interpret experimental results, test significance (t-tests/ANOVA), or generate reproducible reports.
Teams using experimental-data-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/experimental-data-analysis/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How experimental-data-analysis Compares
| Feature / Agent | experimental-data-analysis | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Statistical analysis and reporting for experimental datasets; use when you need to interpret experimental results, test significance (t-tests/ANOVA), or generate reproducible reports.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
## When to Use
- You have experimental results in CSV form and need a reproducible end-to-end analysis workflow (clean → test → report).
- You need to compare two conditions (independent or paired) and determine statistical significance with effect sizes.
- You need to compare 3+ groups (one-way) or multiple factors (multi-way) using ANOVA and post-hoc multiple comparisons.
- You must validate assumptions (normality, homogeneity of variance) and document them in a report.
- You need standardized run outputs (timestamped run directories) for traceability and auditing.
## Key Features
- Reproducible, run-based execution that writes all artifacts into `outputs/runs/<timestamp>/`.
- Data preparation guidance: missing values, outliers, and variable type identification (continuous/categorical; grouping factors).
- Descriptive statistics: means, standard deviations, confidence intervals, and grouped summary tables.
- Inferential testing:
- t-tests (independent/paired) and non-parametric alternatives when assumptions fail.
- ANOVA (one-way and multi-way) with post-hoc testing (e.g., Tukey).
- Reporting outputs: test statistics, p-values, effect sizes, tables, charts, and explicit assumption notes.
- Reference materials for method selection and reporting templates:
- `references/stats-method-selection.md`
- `references/reporting-template.md`
## Dependencies
- Python 3.10+
- pandas >= 2.0
- numpy >= 1.24
- scipy >= 1.10
## Example Usage
The workflow is run-directory based. Initialize a new run, then analyze using the latest run by default.
```bash
# 1) Initialize a new run directory with sample inputs/config
python scripts/init_run.py
# 2) Run analysis (uses the latest outputs/runs/<timestamp>/ by default)
python scripts/analyze_experiment.py
```
Expected directory conventions:
- A new run directory is created at: `outputs/runs/<timestamp>/`
- Configuration file location: `outputs/runs/<timestamp>/config.json`
- All intermediate and final artifacts (config, inputs, outputs, figures, tables) must be written inside the run directory.
- Writing outside the run directory is prohibited.
## Implementation Details
### Reproducible Run Management
- Before each execution, run:
- `scripts/init_run.py` to create `outputs/runs/<timestamp>/` and populate initial inputs/config.
- Analysis scripts default to the latest run directory under `outputs/runs/` unless explicitly overridden (if supported by the script).
### Analysis Pipeline
1. **Data Preparation**
- Handle missing values (e.g., drop, impute, or flag) according to the experimental design.
- Detect and treat outliers (e.g., robust rules, domain thresholds), documenting any exclusions.
- Identify variable roles:
- Outcome variable(s): typically continuous measurements.
- Grouping factors: categorical condition labels (treatment/control, timepoint, genotype, etc.).
2. **Descriptive Statistics**
- Compute summary metrics per group:
- Mean, standard deviation, and confidence intervals (commonly 95% CI).
- Produce grouped summary tables suitable for reporting.
3. **Inferential Statistics**
- **Two-group comparisons**
- Use an independent t-test for separate groups.
- Use a paired t-test for repeated measures / matched pairs.
- If assumptions are violated, switch to an appropriate non-parametric alternative.
- **Multi-group / multi-factor comparisons**
- Use one-way ANOVA for a single factor with 3+ levels.
- Use multi-way ANOVA when multiple factors are present.
- **Multiple comparisons**
- Apply post-hoc procedures (e.g., Tukey) after ANOVA when needed.
- Define and document the multiple-comparison control strategy.
4. **Assumption Checks and Reporting Standards**
- Validate and report:
- Normality (per group or model residuals, as appropriate).
- Homogeneity of variance.
- Report, at minimum:
- Test statistic, degrees of freedom (if applicable), p-value.
- Effect size(s) and confidence intervals where applicable.
- Retain analysis code and random seeds to ensure reproducibility.Related Skills
uspto-database
Access USPTO data (Patent Search, PEDS, TSDR, assignments) when you need to query patents/trademarks and retrieve prosecution or status information programmatically.
datamol
A Pythonic wrapper around RDKit with simplified interfaces and sensible defaults. Preferred for standard drug discovery workflows including SMILES parsing, standardization, descriptors, fingerprints, clustering, 3D conformer generation, and parallel processing. Returns native rdkit.Chem.Mol objects. For advanced control or custom parameters, use rdkit directly.
zinc-database
Access the ZINC (230M+ purchasable compounds) database when you need to look up compounds by ZINC ID/SMILES, run similarity/analog searches, or download 3D ready-to-dock structures for virtual screening and drug discovery.
uniprot-database
Direct REST API access to UniProt for protein search, entry retrieval, and identifier mapping; use when you need programmatic UniProtKB queries or cross-database ID conversion.
string-database
Access the STRING database to map identifiers, retrieve protein–protein interaction networks, and run functional/PPI enrichment when you need interaction context for a gene/protein set.
semantic-scholar-database
Access the Semantic Scholar Graph API to search papers and retrieve paper/author/citation data when you need literature discovery or citation graph exploration.
scite-database
Access Scite.ai Smart Citations to classify how a paper is cited (supporting, contrasting, mentioning) and assess scientific claims; use it when you need to evaluate a paper’s reliability or its acceptance in the literature.
research-hotspot-analysis
Analyzes research hotspots and recommends literature based on a disease or topic. Use when the user wants to identify current research trends, hot topics, or get literature recommendations for a specific medical field or disease.
pubchem-database-skill
Programmatic access to the PubChem database (via PUG-REST API and PubChemPy) for searching chemical compounds, retrieving physicochemical properties, performing structure similarity/substructure searches, and obtaining bioactivity data.
pdf-extract-experimental-materials
Extract experimental materials and instrument information from PDFs (or PDF-derived text/Markdown) into three CSV tables; use when a paper/report contains sections like Materials and Methods, Key Resources Table, Reagents, Antibodies, Consumables, Software, Equipment, Instruments, or Reagent Preparation.
pdb-database
Access the RCSB Protein Data Bank (PDB) to search, download, and programmatically retrieve 3D macromolecular structures and metadata; use when you need structure discovery (text/sequence/3D similarity) or automated structural data ingestion for structural biology and drug discovery workflows.
kegg-database
Direct access to KEGG via the REST API for academic-only pathway/gene/compound/drug queries; use when you need precise HTTP-level control or targeted KEGG ID mapping.