scientific-data-preprocessing

⚠️ CRITICAL USER EXPERIENCE-BASED SKILL - ALWAYS CONSULT BEFORE DATA PREPROCESSING ⚠️ Prevents catastrophic errors (88.9% error rate in V1.0 case study) through multi-level feature analysis, data leakage detection, and semantic validation. MANDATORY for: data preprocessing, feature engineering, standardization, normalization, interpolation, missing value handling, feature selection, or ANY data transformation task. Covers grouped time-series, cross-sectional, panel data. Detects: time travel leakage, causal inversion, ID misuse, semantic-numeric fallacies, distribution blindness. User's hard-won lessons from real project failures.

1,140 stars

byforyourhealth111-pixel

View on GitHub Installation ↓

Best use case

scientific-data-preprocessing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using scientific-data-preprocessing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/scientific-data-preprocessing/SKILL.md --create-dirs "https://raw.githubusercontent.com/foryourhealth111-pixel/Vibe-Skills/main/bundled/skills/scientific-data-preprocessing/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/scientific-data-preprocessing/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How scientific-data-preprocessing Compares

Feature / Agent	scientific-data-preprocessing	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Top AI Agents for Productivity

See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

SKILL.md Source

# Scientific Data Preprocessing Skill

⚠️ **CRITICAL: USER'S HARD-WON EXPERIENCE - MANDATORY CONSULTATION** ⚠️

This skill encapsulates painful lessons learned from real preprocessing disasters (88.9% error rate documented). **ALWAYS use this skill for planning, reflection, and validation when ANY data preprocessing is involved.**

**Why this skill is mandatory:**
- Based on actual project failures (V1.0, V2.0 case studies)
- Prevents data leakage that causes production disasters
- Catches semantic errors AI agents commonly make
- Saves weeks of debugging and model retraining

**When to invoke (DO NOT SKIP):**
- ✅ Before starting ANY data preprocessing task
- ✅ During preprocessing for reflection and validation
- ✅ After preprocessing for comprehensive audit
- ✅ When reviewing AI-generated preprocessing code

---

## Core Mission

Prevent catastrophic preprocessing errors in grouped time-series data by applying multi-level feature analysis and respecting data structure boundaries.

## When to Use This Skill

**MANDATORY consultation - trigger immediately when:**

### Data Preprocessing Tasks (ALWAYS)
- Any data cleaning, transformation, or preparation work
- Loading and preparing data for modeling
- Creating training/test splits
- Handling missing values (imputation, deletion)
- Feature scaling/normalization/standardization
- Encoding categorical variables
- Feature engineering or construction
- Feature selection or dimensionality reduction

### Data Structure Types (ALWAYS)
- Preprocesssing time-series data with natural groupings (matches, sessions, patients, experiments)
- Sports analytics (tennis, basketball, etc.)
- Medical/clinical data with patient groupings
- Panel data or longitudinal studies
- Any grouped/hierarchical data structure

### Quality Assurance (ALWAYS)
- Auditing existing preprocessing for data leakage or semantic errors
- Reviewing AI-generated preprocessing code for common pitfalls
- Validating preprocessing before model training
- Debugging unexpected model performance

### Critical Checkpoints (NEVER SKIP)
- ✅ **BEFORE**: Planning preprocessing strategy
- ✅ **DURING**: Reflecting on decisions and checking for errors
- ✅ **AFTER**: Comprehensive validation and audit

**Trigger keywords that MUST invoke this skill:**
- "preprocess", "preprocessing", "data cleaning", "data preparation"
- "standardize", "normalize", "scale", "transform"
- "impute", "fill missing", "handle NaN"
- "encode", "one-hot", "categorical"
- "feature engineering", "feature selection", "feature construction"
- "train test split", "cross validation split"
- "interpolate", "smooth", "aggregate"

## Not For / Boundaries

This skill does NOT:
- Handle purely cross-sectional data (ungrouped, single timepoint)
- Make domain-specific feature engineering decisions (you decide business logic)
- Choose ML models (focuses on preprocessing only)
- Handle distributed/big data infrastructure (assumes data fits in memory)

Required inputs before proceeding:
1. Confirmation that data has groups (e.g., match_id, patient_id, session_id)
2. Understanding of whether goal is within-group (relative) or cross-group (absolute) comparison
3. Domain constraints on data ranges/units

## Quick Reference

### Multi-Level Feature Analysis Framework

**Level 1: Data Type**
```python
# Check data types
df.dtypes  # int64, float64, object, etc.
```

**Level 2: Feature Type Classification**
```python
# Binary (0/1)
binary_features = [col for col in df.columns if df[col].nunique() == 2]

# Categorical (finite discrete values)
categorical_features = [col for col in df.select_dtypes(include='object').columns]

# Continuous (infinite possible values)
continuous_features = [col for col in df.select_dtypes(include=['float64', 'int64']).columns
                       if df[col].nunique() > 10]
```

**Level 3: Data Structure**
```python
# Check for grouping
print(f"Number of groups: {df['group_id'].nunique()}")
print(f"Avg points per group: {df.groupby('group_id').size().mean():.1f}")

# Check for time-series
df_sorted = df.sort_values(['group_id', 'timestamp'])
```

**Level 4: Physical Meaning**
```python
# Validate physical ranges
assert df['speed_mph'].max() < 200, "Speed exceeds physical limit"
assert df['distance_meters'].min() >= 0, "Negative distance impossible"
```

### Critical Processing Decision Tree

```python
# Decision: Within-group or global processing?
def choose_processing_scope(data, feature, goal):
    """
    goal = 'relative' → within-group (e.g., "this point was intense FOR THIS MATCH")
    goal = 'absolute' → global (e.g., "this was an intense point OVERALL")
    """
    if goal == 'relative':
        return 'within_group'
    elif goal == 'absolute':
        return 'global'
    else:
        raise ValueError("Goal must be 'relative' or 'absolute'")
```

### Pattern 1: Within-Group Interpolation (CORRECT)

```python
from scipy.interpolate import CubicSpline
import numpy as np

# ✅ CORRECT: Interpolate within each group
for group_id in df['match_id'].unique():
    mask = df['match_id'] == group_id
    group_data = df.loc[mask, 'speed_mph'].copy()

    # Get valid (non-NaN) indices
    valid_idx = group_data.notna()
    valid_positions = np.where(valid_idx)[0]
    valid_values = group_data[valid_idx].values

    if len(valid_positions) >= 4:
        cs = CubicSpline(valid_positions, valid_values)
        missing_positions = np.where(~valid_idx)[0]
        df.loc[mask & ~valid_idx, 'speed_mph'] = cs(missing_positions)
```

### Pattern 2: Global Interpolation (WRONG - Don't Do This)

```python
# ❌ WRONG: Cross-group interpolation
# This interpolates between match A's last point and match B's first point!
cs = CubicSpline(
    np.where(df['speed_mph'].notna())[0],  # ❌ All indices globally
    df['speed_mph'].dropna().values
)
df.loc[df['speed_mph'].isna(), 'speed_mph'] = cs(
    np.where(df['speed_mph'].isna())[0]
)
```

### Pattern 3: Within-Group Standardization (for Relative Analysis)

```python
from sklearn.preprocessing import StandardScaler

# ✅ CORRECT: Standardize within each match
for match_id in df['match_id'].unique():
    mask = df['match_id'] == match_id
    scaler = StandardScaler()

    df.loc[mask, 'distance_run_std_within'] = scaler.fit_transform(
        df.loc[mask, [['distance_run']]
    )

# Interpretation: z=+2 means "2 std above average FOR THIS MATCH"
```

### Pattern 4: Global Standardization (for Absolute Comparison)

```python
# ✅ CORRECT: Global standardization (when appropriate)
scaler = StandardScaler()
df['distance_run_std_global'] = scaler.fit_transform(df[['distance_run']])

# Interpretation: z=+2 means "2 std above average ACROSS ALL MATCHES"
```

### Pattern 5: Feature Type Processing Rules

```python
# Binary variables (0/1) - KEEP AS-IS
binary_cols = ['is_ace', 'is_winner', 'is_error']
# ❌ NEVER standardize these! They have semantic meaning as 0/1

# Categorical variables - ONE-HOT ENCODE
df_encoded = pd.get_dummies(df, columns=['server', 'serve_number'], dtype=int)

# Continuous variables - STANDARDIZE (within-group or global)
continuous_cols = ['distance_run', 'rally_count', 'speed_mph']
# ✅ Apply pattern 3 or 4 based on goal
```

### Pattern 6: Sliding Window Features (for Momentum)

```python
# ✅ CORRECT: Sliding window for momentum analysis
window = 10

df['win_rate_last10'] = df.groupby('match_id')['point_won'].transform(
    lambda x: x.rolling(window, min_periods=1).mean()
)

# ❌ WRONG: Cumulative features (loses temporal locality)
df['cumulative_points_won'] = df.groupby('match_id')['point_won'].cumsum()
# This just increases monotonically and correlates with point_number
```

### Pattern 7: Data Quality Validation

```python
def validate_data_quality(df, feature, expected_range):
    """Validate before processing"""
    # Check range
    assert df[feature].min() >= expected_range[0], f"{feature} below minimum"
    assert df[feature].max() <= expected_range[1], f"{feature} above maximum"

    # Check for anomalies
    mean = df[feature].mean()
    std = df[feature].std()

    if std > mean:
        print(f"⚠️ WARNING: {feature} has std > mean (highly skewed or errors)")

    # Check missing pattern
    missing_by_group = df.groupby('match_id')[feature].apply(lambda x: x.isna().sum())
    if missing_by_group.max() > len(df) / df['match_id'].nunique() * 0.5:
        print(f"⚠️ WARNING: {feature} has >50% missing in some groups")

# Example
validate_data_quality(df, 'speed_mph', expected_range=(50, 165))
```

### Pattern 8: Detect Processing Scope Automatically

```python
def detect_processing_scope(df, group_col, feature_col):
    """
    Recommend within-group vs global based on variance structure
    """
    # Calculate variance components
    within_group_var = df.groupby(group_col)[feature_col].var().mean()
    global_var = df[feature_col].var()

    # Intraclass correlation
    between_group_var = global_var - within_group_var
    icc = between_group_var / global_var

    if icc > 0.5:
        return 'within_group', f"High between-group variance (ICC={icc:.2f})"
    else:
        return 'global', f"Low between-group variance (ICC={icc:.2f})"

scope, reason = detect_processing_scope(df, 'match_id', 'distance_run')
print(f"Recommended: {scope} - {reason}")
```

### Pattern 9: Data Leakage Detection

```python
def detect_data_leakage(df, target_col, feature_cols, id_cols):
    """
    Critical checks for data leakage and AI common pitfalls
    """
    issues = []

    # 1. ID Leakage: High cardinality variables as features
    for col in feature_cols:
        if col in id_cols:
            issues.append(f"❌ FATAL: {col} is an ID - NEVER use as feature")
            continue

        # Check if looks like ID (>50% unique)
        uniqueness = df[col].nunique() / len(df)
        if uniqueness > 0.5:
            issues.append(f"⚠️ {col}: {uniqueness*100:.1f}% unique - possible ID leakage")

    # 2. Causal Inversion: Perfect correlation with target
    for col in feature_cols:
        if col == target_col:
            continue
        if df[col].dtype in ['int64', 'float64']:
            corr = abs(df[[col, target_col]].corr().iloc[0, 1])
            if corr > 0.95:
                issues.append(f"❌ FATAL: {col} correlation={corr:.3f} - likely consequence of target!")

    # 3. Meaningless Numeric: Codes treated as numbers
    for col in feature_cols:
        if df[col].dtype in ['int64', 'float64']:
            # Pattern: High values, many uniques, looks like code
            if df[col].min() > 1000 and df[col].nunique() > 100:
                issues.append(f"⚠️ {col}: Looks like code (zipcode/ID) - should be categorical")

    # 4. Time Travel: Check if standardization used global statistics
    # (Requires knowing if train/test split was done first)

    # Print report
    if issues:
        print("="*60)
        print("DATA LEAKAGE AUDIT")
        print("="*60)
        for issue in issues:
            print(issue)
        print("="*60)
    else:
        print("✅ No obvious leakage detected")

    return issues

# Example usage
issues = detect_data_leakage(
    df,
    target_col='point_won',
    feature_cols=['speed_mph', 'user_id', 'distance_run'],
    id_cols=['match_id', 'user_id']
)
```

### Pattern 10: Distribution-Aware Scaling

```python
from scipy.stats import skew, kurtosis
from sklearn.preprocessing import StandardScaler, RobustScaler

def smart_scaler_selection(df, col):
    """
    Choose scaler based on distribution characteristics
    """
    data = df[col].dropna()

    # Check distribution
    skewness = skew(data)
    kurt = kurtosis(data)

    print(f"{col}: skewness={skewness:.2f}, kurtosis={kurt:.2f}")

    if abs(skewness) < 0.5 and abs(kurt) < 3:
        # Roughly normal
        print("  → StandardScaler (data is roughly normal)")
        return StandardScaler(), None

    elif skewness > 1:
        # Right-skewed (long tail)
        print("  → Log transform + StandardScaler (right-skewed)")
        return StandardScaler(), 'log'

    else:
        # Heavy outliers or non-normal
        print("  → RobustScaler (heavy outliers)")
        return RobustScaler(), None

# Example usage
for col in continuous_features:
    scaler, transform = smart_scaler_selection(df, col)

    if transform == 'log':
        df[f'{col}_log'] = np.log1p(df[col])
        df[f'{col}_scaled'] = scaler.fit_transform(df[[f'{col}_log']])
    else:
        df[f'{col}_scaled'] = scaler.fit_transform(df[[col]])
```

## Examples

### Example 1: Tennis Match Preprocessing (Complete Pipeline)

**Input:**
- CSV with 7,284 rows, 31 matches
- Features: `speed_mph`, `distance_run`, `rally_count`, `is_ace`, `server`
- Goal: Analyze momentum (relative intensity within each match)

**Steps:**
```python
import pandas as pd
from sklearn.preprocessing import StandardScaler

# 1. Load and inspect
df = pd.read_csv('tennis_data.csv')
print(f"Matches: {df['match_id'].nunique()}")
print(f"Features: {df.dtypes}")

# 2. Classify features
binary_features = ['is_ace', 'is_winner', 'is_break_point']
categorical_features = ['server', 'serve_number']
continuous_features = ['distance_run', 'speed_mph', 'rally_count']

# 3. Validate data quality
for feat in continuous_features:
    print(f"\n{feat}:")
    print(df[feat].describe())
    # Check for impossible values
    if feat == 'speed_mph':
        assert df[feat].max() < 170, "Speed exceeds world record!"

# 4. Handle missing values (within-group)
for match_id in df['match_id'].unique():
    mask = df['match_id'] == match_id
    for feat in continuous_features:
        if df.loc[mask, feat].isna().any():
            # Simple linear interpolation within match
            df.loc[mask, feat] = df.loc[mask, feat].interpolate(method='linear')

# 5. One-hot encode categorical
df = pd.get_dummies(df, columns=categorical_features, dtype=int)

# 6. Standardize continuous features WITHIN each match
for feat in continuous_features:
    df[f'{feat}_std'] = np.nan
    for match_id in df['match_id'].unique():
        mask = df['match_id'] == match_id
        scaler = StandardScaler()
        df.loc[mask, f'{feat}_std'] = scaler.fit_transform(
            df.loc[mask, [[feat]]
        )

# 7. Create sliding window features
window = 10
df['win_rate_last10'] = df.groupby('match_id')['point_won'].transform(
    lambda x: x.rolling(window, min_periods=1).mean()
)

# 8. KEEP binary features as 0/1 (don't transform!)
# binary_features are already correct

print("\n✅ Preprocessing complete!")
print(f"Final shape: {df.shape}")
print(f"Standardized features: {[f for f in df.columns if f.endswith('_std')]}")
```

**Expected output:**
- Binary features remain 0/1
- Categorical features one-hot encoded (e.g., `server_1`, `server_2`)
- Continuous features have both original and `_std` versions
- `_std` features have mean≈0, std≈1 WITHIN each match
- Sliding window features capture local momentum
- No missing values

### Example 2: Detecting Cross-Group Contamination

**Input:**
- Preprocessed data where you suspect cross-group standardization

**Steps:**
```python
# Check if standardization was done correctly
def check_within_group_standardization(df, group_col, feature_std_col):
    """
    Verify that standardized feature has mean≈0, std≈1 within each group
    """
    results = df.groupby(group_col)[feature_std_col].agg(['mean', 'std'])

    # Within-group standardization: each group should have mean≈0, std≈1
    if (results['mean'].abs() < 0.1).all() and (results['std'].between(0.9, 1.1)).all():
        print("✅ CORRECT: Within-group standardization detected")
        return True

    # Global standardization: groups will have varying means and stds
    else:
        print("❌ WRONG: Global standardization detected!")
        print("Group means:", results['mean'].values[:5])
        print("Group stds:", results['std'].values[:5])
        return False

check_within_group_standardization(df, 'match_id', 'distance_run_std')
```

**Expected output:**
- CORRECT: All group means ≈ 0, all group stds ≈ 1
- WRONG: Group means vary widely, indicating global standardization

### Example 3: Fixing Cumulative Feature Error

**Input:**
- Existing pipeline using cumulative sums for momentum

**Steps:**
```python
# ❌ WRONG approach (existing code)
df['cumulative_wins'] = df.groupby('match_id')['point_won'].cumsum()

# Problem: This just counts total wins up to this point
# Doesn't capture recent momentum!

# ✅ CORRECT approach (fix)
# Replace cumulative with sliding window
window = 10
df['recent_win_rate'] = df.groupby('match_id')['point_won'].transform(
    lambda x: x.rolling(window, min_periods=1).mean()
)

# Compare
print("Cumulative (wrong):", df['cumulative_wins'].values[50:60])
print("Sliding window (correct):", df['recent_win_rate'].values[50:60])

# Cumulative: [25, 26, 26, 27, 28, ...] - monotonic
# Sliding window: [0.6, 0.7, 0.5, 0.6, ...] - fluctuates with momentum
```

**Expected output:**
- Cumulative features removed
- Sliding window features show local variations
- Momentum analysis now captures short-term trends

## References

- `references/index.md`: Navigation and overview
- `references/error-case-studies.md`: Real-world preprocessing disasters from tennis data
- `references/decision-trees.md`: Full decision trees for all preprocessing choices
- `references/validation-checklist.md`: Pre-processing validation checklist
- `references/ai-common-pitfalls.md`: AI-specific errors (data leakage, semantic fallacies, distribution blindness)

## Maintenance

⚠️ **CRITICAL NOTICE: USER'S PERSONAL EXPERIENCE-BASED SKILL** ⚠️

**This skill is NOT theoretical - it's based on real project failures:**
- **V1.0 disaster**: 88.9% error rate, weeks of wasted work
- **V2.0 issues**: Cross-group contamination, unreliable results
- **V3.0 success**: All errors fixed, production-ready

**Why this matters to you (Claude):**
- These are the EXACT errors AI agents commonly make
- User has already paid the price for these mistakes
- Ignoring this skill = repeating documented failures
- Following this skill = learning from experience without pain

**Authority level**: HIGHEST
- Based on user's hard-won lessons from actual project
- Validated through multiple iterations (V1.0 → V2.0 → V3.0)
- Every error documented with impact metrics
- Every fix validated with comprehensive testing

**Sources**:
- Primary: User's personal project (2024 MCM Problem C - Tennis Momentum Analysis)
- Secondary: Statistical best practices for grouped data
- Tertiary: Common AI preprocessing errors observed across domains

**Mandatory consultation**:
- ⚠️ ALWAYS consult before, during, and after any data preprocessing
- ⚠️ NEVER skip validation steps outlined in this skill
- ⚠️ When in doubt, err on the side of caution (use this skill)

**Last updated**: 2026-01-18 (V1.1)

**Known limits:**
- Assumes data fits in memory (not for big data infrastructure)
- Focused on numeric/categorical features (text/image preprocessing partially covered)
- Does not prescribe domain-specific feature engineering (user decides business logic)
- Requires basic understanding of statistics (mean, std, correlation)

Related Skills

zinc-database

1174

from foryourhealth111-pixel/Vibe-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

uspto-database

1174

from foryourhealth111-pixel/Vibe-Skills

Access USPTO APIs for patent/trademark searches, examination history (PEDS), assignments, citations, office actions, TSDR, for IP analysis and prior art searches.

usfiscaldata

1174

from foryourhealth111-pixel/Vibe-Skills

Query the U.S. Treasury Fiscal Data API for federal financial data including national debt, government spending, revenue, interest rates, exchange rates, and savings bonds. Access 54 datasets and 182 data tables with no API key required. Use when working with U.S. federal fiscal data, national debt tracking (Debt to the Penny), Daily Treasury Statements, Monthly Treasury Statements, Treasury securities auctions, interest rates on Treasury securities, foreign exchange rates, savings bonds, or any U.S. government financial statistics.

uniprot-database

1174

from foryourhealth111-pixel/Vibe-Skills

Direct REST API access to UniProt. Protein searches, FASTA retrieval, ID mapping, Swiss-Prot/TrEMBL. For Python workflows with multiple databases, prefer bioservices (unified interface to 40+ services). Use this for direct HTTP/REST work or UniProt-specific control.

string-database

1174

from foryourhealth111-pixel/Vibe-Skills

Query STRING API for protein-protein interactions (59M proteins, 20B interactions). Network analysis, GO/KEGG enrichment, interaction discovery, 5000+ species, for systems biology.

splitting-datasets

1174

from foryourhealth111-pixel/Vibe-Skills

Process split datasets into training, validation, and testing sets for ML model development. Use when requesting "split dataset", "train-test split", or "data partitioning". Trigger with relevant phrases based on skill purpose.

senior-data-scientist

1174

from foryourhealth111-pixel/Vibe-Skills

World-class data science skill for statistical modeling, experimentation, causal inference, and advanced analytics. Expertise in Python (NumPy, Pandas, Scikit-learn), R, SQL, statistical methods, A/B testing, time series, and business intelligence. Includes experiment design, feature engineering, model evaluation, and stakeholder communication. Use when designing experiments, building predictive models, performing causal analysis, or driving data-driven decisions.

scientific-writing

1174

from foryourhealth111-pixel/Vibe-Skills

Core skill for the deep research and writing tool. Write scientific manuscripts in full paragraphs (never bullet points). Use two-stage process: (1) create section outlines with key points using research-lookup, (2) convert to flowing prose. IMRAD structure, citations (APA/AMA/Vancouver), figures/tables, reporting guidelines (CONSORT/STROBE/PRISMA), for research papers and journal submissions.

scientific-visualization

1174

from foryourhealth111-pixel/Vibe-Skills

Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.

scientific-slides

1174

from foryourhealth111-pixel/Vibe-Skills

Build slide decks and presentations for research talks. Use this for making PowerPoint slides, conference presentations, seminar talks, research presentations, thesis defense slides, or any scientific talk. Provides slide structure, design templates, timing guidance, and visual validation. Works with PowerPoint and LaTeX Beamer.

scientific-schematics

1174

from foryourhealth111-pixel/Vibe-Skills

Create publication-quality scientific diagrams using Nano Banana Pro AI with smart iterative refinement. Uses Gemini 3 Pro for quality review. Only regenerates if quality is below threshold for your document type. Specialized in neural network architectures, system diagrams, flowcharts, biological pathways, and complex scientific visualizations.

scientific-reporting

1174

from foryourhealth111-pixel/Vibe-Skills

Write research/technical reports with strong structure + figure standards. Supports Markdown/HTML/PDF outputs (Quarto optional), executive summary, methods, results, discussion, and reproducibility appendix.