code-science

Scientific programming best practices including reproducible research, computational notebooks, version control for research, data management, HPC/parallel computing, and research software engineering. Use when user needs help with research code organization, reproducibility, scientific Python/R workflows, or computational infrastructure. Triggers on "reproducible research", "research code", "scientific computing", "HPC", "parallel computing", "Jupyter", "notebook", "data management plan", "research software", "code review for science".

564 stars

bybeita6969

View on GitHub Installation ↓

Best use case

code-science is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using code-science should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/code-science/SKILL.md --create-dirs "https://raw.githubusercontent.com/beita6969/ScienceClaw/main/skills/code-science/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/code-science/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How code-science Compares

Feature / Agent	code-science	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

SKILL.md Source

# Scientific Programming

Best practices for research software and reproducible computation.

## Project Structure

```
project/
├── README.md              # Project overview, how to reproduce
├── LICENSE                 # MIT, Apache 2.0, or GPL
├── requirements.txt       # or environment.yml (conda)
├── setup.py / pyproject.toml
├── data/
│   ├── raw/               # Never modify raw data
│   ├── processed/         # Cleaned/transformed data
│   └── external/          # Third-party data
├── src/ or scripts/
│   ├── data_processing.py
│   ├── analysis.py
│   ├── models.py
│   └── visualization.py
├── notebooks/             # Exploratory analysis
│   ├── 01_eda.ipynb
│   ├── 02_modeling.ipynb
│   └── 03_figures.ipynb
├── results/
│   ├── figures/
│   └── tables/
├── tests/
└── docs/
```

## Reproducibility Checklist

1. **Environment**: Pin all dependencies with versions
   ```bash
   pip freeze > requirements.txt
   # or conda
   conda env export > environment.yml
   ```

2. **Random seeds**: Set and document all random seeds
   ```python
   import numpy as np
   import random
   SEED = 42
   np.random.seed(SEED)
   random.seed(SEED)
   # torch.manual_seed(SEED)
   # tf.random.set_seed(SEED)
   ```

3. **Data versioning**: Use DVC or git-lfs for large data
   ```bash
   dvc init
   dvc add data/raw/dataset.csv
   git add data/raw/dataset.csv.dvc
   ```

4. **Configuration**: Separate config from code
   ```python
   # config.yaml
   # experiment:
   #   learning_rate: 0.001
   #   batch_size: 32
   #   epochs: 100
   import yaml
   with open('config.yaml') as f:
       config = yaml.safe_load(f)
   ```

5. **Logging**: Record all experiments
   ```python
   import logging
   logging.basicConfig(level=logging.INFO, 
                       format='%(asctime)s %(levelname)s: %(message)s',
                       filename='experiment.log')
   ```

## Parallel Computing

```python
# Multiprocessing (CPU-bound)
from multiprocessing import Pool
import numpy as np

def process_chunk(data):
    return heavy_computation(data)

with Pool(processes=8) as pool:
    results = pool.map(process_chunk, data_chunks)

# Concurrent futures (simpler API)
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor

with ProcessPoolExecutor(max_workers=8) as executor:
    results = list(executor.map(process_func, items))

# For I/O-bound tasks (API calls, file reading)
with ThreadPoolExecutor(max_workers=20) as executor:
    results = list(executor.map(fetch_data, urls))
```

## Performance Optimization

```python
# Profiling
import cProfile
cProfile.run('my_function()', sort='cumulative')

# Line profiling
# pip install line_profiler
# @profile decorator, then: kernprof -l -v script.py

# NumPy vectorization (avoid loops)
# Bad:
result = [x**2 + 2*x + 1 for x in data]
# Good:
result = data**2 + 2*data + 1

# Memory profiling
# pip install memory_profiler
# @profile decorator, then: python -m memory_profiler script.py
```

## Data Management

### FAIR Principles
- **Findable**: Persistent identifiers (DOI), rich metadata
- **Accessible**: Open protocols, authentication when needed
- **Interoperable**: Standard formats (CSV, JSON, HDF5, NetCDF)
- **Reusable**: Clear license, provenance, community standards

### File Formats for Science
| Format | Best For | Size | Speed |
|--------|----------|------|-------|
| CSV | Small tabular, universal | Large | Slow |
| Parquet | Large tabular, columnar | Small | Fast |
| HDF5 | Multidimensional arrays | Small | Fast |
| NetCDF | Climate/geospatial | Small | Fast |
| FITS | Astronomy | Medium | Fast |
| Feather | DataFrame interchange | Small | Very fast |

```python
# Parquet (recommended for large datasets)
df.to_parquet('data.parquet', compression='snappy')
df = pd.read_parquet('data.parquet')

# HDF5 (for arrays)
import h5py
with h5py.File('data.h5', 'w') as f:
    f.create_dataset('experiment1', data=array)
```

## Testing Scientific Code

```python
import numpy as np
import pytest

def test_conservation_law():
    """Physical quantities should be conserved"""
    initial_energy = compute_energy(initial_state)
    final_energy = compute_energy(simulate(initial_state))
    np.testing.assert_allclose(initial_energy, final_energy, rtol=1e-6)

def test_known_solution():
    """Compare against analytical solution"""
    numerical = solve_numerically(params)
    analytical = analytical_solution(params)
    np.testing.assert_allclose(numerical, analytical, atol=1e-4)

def test_symmetry():
    """Result should be symmetric under transformation"""
    result1 = compute(data)
    result2 = compute(transform(data))
    np.testing.assert_array_equal(result1, result2)
```

## Tips
- Raw data is sacred — never modify it, only create processed copies
- Use version control (git) from day one
- Write README before writing code
- Automate the full pipeline (Makefile or Snakemake)
- Document assumptions and decisions in code comments
- Use type hints for clarity in scientific code
- Publish code alongside papers (GitHub + Zenodo for DOI)

Related Skills

social-science-research

564

from beita6969/ScienceClaw

Orchestrates a social science research workflow from literature review through data collection, text analysis, statistical modeling, and report generation. Use when conducting empirical social science research, policy analysis, or mixed-methods studies. NOT for pure natural science analysis or clinical trial data.

social-science-analysis

564

from beita6969/ScienceClaw

Social science research methods including survey design, qualitative analysis, content analysis, network analysis, psychometrics, and mixed methods. Covers sociology, psychology, political science, education, and communication studies. Use when user designs surveys, analyzes qualitative data, does content analysis, builds scales, or uses mixed methods. Triggers on "survey design", "qualitative analysis", "content analysis", "Likert scale", "thematic analysis", "grounded theory", "factor analysis", "SEM", "structural equation", "psychometrics", "interview coding".

scienceclaw-verification

564

from beita6969/ScienceClaw

Verify scientific claims, check calculations, validate experimental designs, and fact-check citations. Use when: (1) checking a claim against evidence, (2) validating statistical analyses, (3) verifying experimental reproducibility claims, (4) fact-checking references, (5) adversarial review of research. NOT for: generating new content (use scienceclaw-generation), simple QA (use scienceclaw-qa).

scienceclaw-summarization

564

from beita6969/ScienceClaw

Summarize scientific papers, datasets, experimental results, and literature reviews. Use when: (1) condensing research papers, (2) creating literature reviews, (3) summarizing experimental findings, (4) meta-analysis synthesis, (5) creating executive summaries of research. NOT for: information extraction (use scienceclaw-ie), full paper retrieval (use scienceclaw-retrieval), or writing new content (use scienceclaw-generation).

scienceclaw-retrieval

564

from beita6969/ScienceClaw

Retrieve scientific information from databases, literature, and knowledge bases. Use when: (1) finding relevant papers, (2) querying scientific databases, (3) cross-referencing findings, (4) building bibliographies, (5) systematic literature search. NOT for: answering questions (use scienceclaw-qa), summarizing (use scienceclaw-summarization), or data analysis (use code-execution skill).

scienceclaw-reasoning

564

from beita6969/ScienceClaw

Perform multi-step scientific reasoning, proof construction, causal inference, and logical argumentation. Use when: (1) deriving conclusions from premises, (2) causal analysis, (3) mathematical proofs, (4) hypothesis evaluation, (5) counterfactual reasoning. NOT for: simple factual questions (use scienceclaw-qa), data analysis (use code-execution), or literature search (use scienceclaw-retrieval).

scienceclaw-qa

564

from beita6969/ScienceClaw

Answer scientific questions across all disciplines with evidence-based responses and citations. Use when: (1) user asks factual science questions, (2) needs explanation of concepts/theories/methods, (3) multi-step scientific reasoning needed. Covers natural sciences (physics, chemistry, biology, medicine, materials, astronomy, earth science, math, CS) and social sciences (economics, sociology, psychology, political science, linguistics, history, law, philosophy, education). NOT for: opinion-based questions, non-scientific queries, or when code execution is needed (use code-execution skill).

scienceclaw-prediction

564

from beita6969/ScienceClaw

Predict scientific properties, trends, and outcomes. Use when: user asks for property prediction, trend forecasting, or model-based estimation. NOT for: historical data lookup or real-time monitoring.

scienceclaw-ie

564

from beita6969/ScienceClaw

Extract structured information from scientific texts: entities, relations, data tables, methods, results. Use when: (1) parsing papers for key data, (2) extracting experimental parameters, (3) building knowledge graphs from literature, (4) NER on scientific documents, (5) extracting methods/results sections. NOT for: summarization (use scienceclaw-summarization), full text retrieval (use scienceclaw-retrieval).

scienceclaw-generation

564

from beita6969/ScienceClaw

Generate scientific hypotheses, experimental designs, and paper drafts. Use when: user asks to propose hypotheses, design experiments, or write scientific content. NOT for: data analysis or literature search.

scienceclaw-discovery

564

from beita6969/ScienceClaw

Identify research gaps, synthesize cross-disciplinary insights, and generate novel hypotheses. Use when: user asks about unexplored areas, cross-field connections, or new research directions. NOT for: routine literature review or data analysis.

scienceclaw-classification

564

from beita6969/ScienceClaw

Classify scientific content by discipline, methodology, topic, and quality. Use when: user asks to categorize papers, methods, or research outputs. NOT for: simple keyword tagging or non-scientific content.