experimental-data-analysis

Statistical analysis and reporting for experimental datasets; use when you need to interpret experimental results, test significance (t-tests/ANOVA), or generate reproducible reports.

53 stars

byaipoch

View on GitHub Installation ↓

Best use case

experimental-data-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Statistical analysis and reporting for experimental datasets; use when you need to interpret experimental results, test significance (t-tests/ANOVA), or generate reproducible reports.

Teams using experimental-data-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/experimental-data-analysis/SKILL.md --create-dirs "https://raw.githubusercontent.com/aipoch/medical-research-skills/main/scientific-skills/Data Analysis/experimental-data-analysis/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/experimental-data-analysis/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How experimental-data-analysis Compares

Feature / Agent	experimental-data-analysis	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Statistical analysis and reporting for experimental datasets; use when you need to interpret experimental results, test significance (t-tests/ANOVA), or generate reproducible reports.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You have experimental results in CSV form and need a reproducible end-to-end analysis workflow (clean → test → report).
- You need to compare two conditions (independent or paired) and determine statistical significance with effect sizes.
- You need to compare 3+ groups (one-way) or multiple factors (multi-way) using ANOVA and post-hoc multiple comparisons.
- You must validate assumptions (normality, homogeneity of variance) and document them in a report.
- You need standardized run outputs (timestamped run directories) for traceability and auditing.

## Key Features

- Reproducible, run-based execution that writes all artifacts into `outputs/runs/<timestamp>/`.
- Data preparation guidance: missing values, outliers, and variable type identification (continuous/categorical; grouping factors).
- Descriptive statistics: means, standard deviations, confidence intervals, and grouped summary tables.
- Inferential testing:
  - t-tests (independent/paired) and non-parametric alternatives when assumptions fail.
  - ANOVA (one-way and multi-way) with post-hoc testing (e.g., Tukey).
- Reporting outputs: test statistics, p-values, effect sizes, tables, charts, and explicit assumption notes.
- Reference materials for method selection and reporting templates:
  - `references/stats-method-selection.md`
  - `references/reporting-template.md`

## Dependencies

- Python 3.10+
- pandas >= 2.0
- numpy >= 1.24
- scipy >= 1.10

## Example Usage

The workflow is run-directory based. Initialize a new run, then analyze using the latest run by default.

```bash
# 1) Initialize a new run directory with sample inputs/config
python scripts/init_run.py

# 2) Run analysis (uses the latest outputs/runs/<timestamp>/ by default)
python scripts/analyze_experiment.py
```

Expected directory conventions:

- A new run directory is created at: `outputs/runs/<timestamp>/`
- Configuration file location: `outputs/runs/<timestamp>/config.json`
- All intermediate and final artifacts (config, inputs, outputs, figures, tables) must be written inside the run directory.
- Writing outside the run directory is prohibited.

## Implementation Details

### Reproducible Run Management

- Before each execution, run:
  - `scripts/init_run.py` to create `outputs/runs/<timestamp>/` and populate initial inputs/config.
- Analysis scripts default to the latest run directory under `outputs/runs/` unless explicitly overridden (if supported by the script).

### Analysis Pipeline

1. **Data Preparation**
   - Handle missing values (e.g., drop, impute, or flag) according to the experimental design.
   - Detect and treat outliers (e.g., robust rules, domain thresholds), documenting any exclusions.
   - Identify variable roles:
     - Outcome variable(s): typically continuous measurements.
     - Grouping factors: categorical condition labels (treatment/control, timepoint, genotype, etc.).

2. **Descriptive Statistics**
   - Compute summary metrics per group:
     - Mean, standard deviation, and confidence intervals (commonly 95% CI).
   - Produce grouped summary tables suitable for reporting.

3. **Inferential Statistics**
   - **Two-group comparisons**
     - Use an independent t-test for separate groups.
     - Use a paired t-test for repeated measures / matched pairs.
     - If assumptions are violated, switch to an appropriate non-parametric alternative.
   - **Multi-group / multi-factor comparisons**
     - Use one-way ANOVA for a single factor with 3+ levels.
     - Use multi-way ANOVA when multiple factors are present.
   - **Multiple comparisons**
     - Apply post-hoc procedures (e.g., Tukey) after ANOVA when needed.
     - Define and document the multiple-comparison control strategy.

4. **Assumption Checks and Reporting Standards**
   - Validate and report:
     - Normality (per group or model residuals, as appropriate).
     - Homogeneity of variance.
   - Report, at minimum:
     - Test statistic, degrees of freedom (if applicable), p-value.
     - Effect size(s) and confidence intervals where applicable.
   - Retain analysis code and random seeds to ensure reproducibility.

Related Skills

uspto-database

from aipoch/medical-research-skills

Access USPTO data (Patent Search, PEDS, TSDR, assignments) when you need to query patents/trademarks and retrieve prosecution or status information programmatically.

datamol

from aipoch/medical-research-skills

A Pythonic wrapper around RDKit with simplified interfaces and sensible defaults. Preferred for standard drug discovery workflows including SMILES parsing, standardization, descriptors, fingerprints, clustering, 3D conformer generation, and parallel processing. Returns native rdkit.Chem.Mol objects. For advanced control or custom parameters, use rdkit directly.

zinc-database

from aipoch/medical-research-skills

Access the ZINC (230M+ purchasable compounds) database when you need to look up compounds by ZINC ID/SMILES, run similarity/analog searches, or download 3D ready-to-dock structures for virtual screening and drug discovery.

uniprot-database

from aipoch/medical-research-skills

Direct REST API access to UniProt for protein search, entry retrieval, and identifier mapping; use when you need programmatic UniProtKB queries or cross-database ID conversion.

string-database

from aipoch/medical-research-skills

Access the STRING database to map identifiers, retrieve protein–protein interaction networks, and run functional/PPI enrichment when you need interaction context for a gene/protein set.

semantic-scholar-database

from aipoch/medical-research-skills

Access the Semantic Scholar Graph API to search papers and retrieve paper/author/citation data when you need literature discovery or citation graph exploration.

scite-database

from aipoch/medical-research-skills

Access Scite.ai Smart Citations to classify how a paper is cited (supporting, contrasting, mentioning) and assess scientific claims; use it when you need to evaluate a paper’s reliability or its acceptance in the literature.

research-hotspot-analysis

from aipoch/medical-research-skills

Analyzes research hotspots and recommends literature based on a disease or topic. Use when the user wants to identify current research trends, hot topics, or get literature recommendations for a specific medical field or disease.

pubchem-database-skill

from aipoch/medical-research-skills

Programmatic access to the PubChem database (via PUG-REST API and PubChemPy) for searching chemical compounds, retrieving physicochemical properties, performing structure similarity/substructure searches, and obtaining bioactivity data.

pdf-extract-experimental-materials

from aipoch/medical-research-skills

Extract experimental materials and instrument information from PDFs (or PDF-derived text/Markdown) into three CSV tables; use when a paper/report contains sections like Materials and Methods, Key Resources Table, Reagents, Antibodies, Consumables, Software, Equipment, Instruments, or Reagent Preparation.

pdb-database

from aipoch/medical-research-skills

Access the RCSB Protein Data Bank (PDB) to search, download, and programmatically retrieve 3D macromolecular structures and metadata; use when you need structure discovery (text/sequence/3D similarity) or automated structural data ingestion for structural biology and drug discovery workflows.

kegg-database

from aipoch/medical-research-skills

Direct access to KEGG via the REST API for academic-only pathway/gene/compound/drug queries; use when you need precise HTTP-level control or targeted KEGG ID mapping.