code-science
Scientific programming best practices including reproducible research, computational notebooks, version control for research, data management, HPC/parallel computing, and research software engineering. Use when user needs help with research code organization, reproducibility, scientific Python/R workflows, or computational infrastructure. Triggers on "reproducible research", "research code", "scientific computing", "HPC", "parallel computing", "Jupyter", "notebook", "data management plan", "research software", "code review for science".
Best use case
code-science is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Scientific programming best practices including reproducible research, computational notebooks, version control for research, data management, HPC/parallel computing, and research software engineering. Use when user needs help with research code organization, reproducibility, scientific Python/R workflows, or computational infrastructure. Triggers on "reproducible research", "research code", "scientific computing", "HPC", "parallel computing", "Jupyter", "notebook", "data management plan", "research software", "code review for science".
Teams using code-science should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/code-science/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How code-science Compares
| Feature / Agent | code-science | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Scientific programming best practices including reproducible research, computational notebooks, version control for research, data management, HPC/parallel computing, and research software engineering. Use when user needs help with research code organization, reproducibility, scientific Python/R workflows, or computational infrastructure. Triggers on "reproducible research", "research code", "scientific computing", "HPC", "parallel computing", "Jupyter", "notebook", "data management plan", "research software", "code review for science".
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
SKILL.md Source
# Scientific Programming
Best practices for research software and reproducible computation.
## Project Structure
```
project/
├── README.md # Project overview, how to reproduce
├── LICENSE # MIT, Apache 2.0, or GPL
├── requirements.txt # or environment.yml (conda)
├── setup.py / pyproject.toml
├── data/
│ ├── raw/ # Never modify raw data
│ ├── processed/ # Cleaned/transformed data
│ └── external/ # Third-party data
├── src/ or scripts/
│ ├── data_processing.py
│ ├── analysis.py
│ ├── models.py
│ └── visualization.py
├── notebooks/ # Exploratory analysis
│ ├── 01_eda.ipynb
│ ├── 02_modeling.ipynb
│ └── 03_figures.ipynb
├── results/
│ ├── figures/
│ └── tables/
├── tests/
└── docs/
```
## Reproducibility Checklist
1. **Environment**: Pin all dependencies with versions
```bash
pip freeze > requirements.txt
# or conda
conda env export > environment.yml
```
2. **Random seeds**: Set and document all random seeds
```python
import numpy as np
import random
SEED = 42
np.random.seed(SEED)
random.seed(SEED)
# torch.manual_seed(SEED)
# tf.random.set_seed(SEED)
```
3. **Data versioning**: Use DVC or git-lfs for large data
```bash
dvc init
dvc add data/raw/dataset.csv
git add data/raw/dataset.csv.dvc
```
4. **Configuration**: Separate config from code
```python
# config.yaml
# experiment:
# learning_rate: 0.001
# batch_size: 32
# epochs: 100
import yaml
with open('config.yaml') as f:
config = yaml.safe_load(f)
```
5. **Logging**: Record all experiments
```python
import logging
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)s: %(message)s',
filename='experiment.log')
```
## Parallel Computing
```python
# Multiprocessing (CPU-bound)
from multiprocessing import Pool
import numpy as np
def process_chunk(data):
return heavy_computation(data)
with Pool(processes=8) as pool:
results = pool.map(process_chunk, data_chunks)
# Concurrent futures (simpler API)
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
with ProcessPoolExecutor(max_workers=8) as executor:
results = list(executor.map(process_func, items))
# For I/O-bound tasks (API calls, file reading)
with ThreadPoolExecutor(max_workers=20) as executor:
results = list(executor.map(fetch_data, urls))
```
## Performance Optimization
```python
# Profiling
import cProfile
cProfile.run('my_function()', sort='cumulative')
# Line profiling
# pip install line_profiler
# @profile decorator, then: kernprof -l -v script.py
# NumPy vectorization (avoid loops)
# Bad:
result = [x**2 + 2*x + 1 for x in data]
# Good:
result = data**2 + 2*data + 1
# Memory profiling
# pip install memory_profiler
# @profile decorator, then: python -m memory_profiler script.py
```
## Data Management
### FAIR Principles
- **Findable**: Persistent identifiers (DOI), rich metadata
- **Accessible**: Open protocols, authentication when needed
- **Interoperable**: Standard formats (CSV, JSON, HDF5, NetCDF)
- **Reusable**: Clear license, provenance, community standards
### File Formats for Science
| Format | Best For | Size | Speed |
|--------|----------|------|-------|
| CSV | Small tabular, universal | Large | Slow |
| Parquet | Large tabular, columnar | Small | Fast |
| HDF5 | Multidimensional arrays | Small | Fast |
| NetCDF | Climate/geospatial | Small | Fast |
| FITS | Astronomy | Medium | Fast |
| Feather | DataFrame interchange | Small | Very fast |
```python
# Parquet (recommended for large datasets)
df.to_parquet('data.parquet', compression='snappy')
df = pd.read_parquet('data.parquet')
# HDF5 (for arrays)
import h5py
with h5py.File('data.h5', 'w') as f:
f.create_dataset('experiment1', data=array)
```
## Testing Scientific Code
```python
import numpy as np
import pytest
def test_conservation_law():
"""Physical quantities should be conserved"""
initial_energy = compute_energy(initial_state)
final_energy = compute_energy(simulate(initial_state))
np.testing.assert_allclose(initial_energy, final_energy, rtol=1e-6)
def test_known_solution():
"""Compare against analytical solution"""
numerical = solve_numerically(params)
analytical = analytical_solution(params)
np.testing.assert_allclose(numerical, analytical, atol=1e-4)
def test_symmetry():
"""Result should be symmetric under transformation"""
result1 = compute(data)
result2 = compute(transform(data))
np.testing.assert_array_equal(result1, result2)
```
## Tips
- Raw data is sacred — never modify it, only create processed copies
- Use version control (git) from day one
- Write README before writing code
- Automate the full pipeline (Makefile or Snakemake)
- Document assumptions and decisions in code comments
- Use type hints for clarity in scientific code
- Publish code alongside papers (GitHub + Zenodo for DOI)Related Skills
social-science-research
Orchestrates a social science research workflow from literature review through data collection, text analysis, statistical modeling, and report generation. Use when conducting empirical social science research, policy analysis, or mixed-methods studies. NOT for pure natural science analysis or clinical trial data.
social-science-analysis
Social science research methods including survey design, qualitative analysis, content analysis, network analysis, psychometrics, and mixed methods. Covers sociology, psychology, political science, education, and communication studies. Use when user designs surveys, analyzes qualitative data, does content analysis, builds scales, or uses mixed methods. Triggers on "survey design", "qualitative analysis", "content analysis", "Likert scale", "thematic analysis", "grounded theory", "factor analysis", "SEM", "structural equation", "psychometrics", "interview coding".
scienceclaw-verification
Verify scientific claims, check calculations, validate experimental designs, and fact-check citations. Use when: (1) checking a claim against evidence, (2) validating statistical analyses, (3) verifying experimental reproducibility claims, (4) fact-checking references, (5) adversarial review of research. NOT for: generating new content (use scienceclaw-generation), simple QA (use scienceclaw-qa).
scienceclaw-summarization
Summarize scientific papers, datasets, experimental results, and literature reviews. Use when: (1) condensing research papers, (2) creating literature reviews, (3) summarizing experimental findings, (4) meta-analysis synthesis, (5) creating executive summaries of research. NOT for: information extraction (use scienceclaw-ie), full paper retrieval (use scienceclaw-retrieval), or writing new content (use scienceclaw-generation).
scienceclaw-retrieval
Retrieve scientific information from databases, literature, and knowledge bases. Use when: (1) finding relevant papers, (2) querying scientific databases, (3) cross-referencing findings, (4) building bibliographies, (5) systematic literature search. NOT for: answering questions (use scienceclaw-qa), summarizing (use scienceclaw-summarization), or data analysis (use code-execution skill).
scienceclaw-reasoning
Perform multi-step scientific reasoning, proof construction, causal inference, and logical argumentation. Use when: (1) deriving conclusions from premises, (2) causal analysis, (3) mathematical proofs, (4) hypothesis evaluation, (5) counterfactual reasoning. NOT for: simple factual questions (use scienceclaw-qa), data analysis (use code-execution), or literature search (use scienceclaw-retrieval).
scienceclaw-qa
Answer scientific questions across all disciplines with evidence-based responses and citations. Use when: (1) user asks factual science questions, (2) needs explanation of concepts/theories/methods, (3) multi-step scientific reasoning needed. Covers natural sciences (physics, chemistry, biology, medicine, materials, astronomy, earth science, math, CS) and social sciences (economics, sociology, psychology, political science, linguistics, history, law, philosophy, education). NOT for: opinion-based questions, non-scientific queries, or when code execution is needed (use code-execution skill).
scienceclaw-prediction
Predict scientific properties, trends, and outcomes. Use when: user asks for property prediction, trend forecasting, or model-based estimation. NOT for: historical data lookup or real-time monitoring.
scienceclaw-ie
Extract structured information from scientific texts: entities, relations, data tables, methods, results. Use when: (1) parsing papers for key data, (2) extracting experimental parameters, (3) building knowledge graphs from literature, (4) NER on scientific documents, (5) extracting methods/results sections. NOT for: summarization (use scienceclaw-summarization), full text retrieval (use scienceclaw-retrieval).
scienceclaw-generation
Generate scientific hypotheses, experimental designs, and paper drafts. Use when: user asks to propose hypotheses, design experiments, or write scientific content. NOT for: data analysis or literature search.
scienceclaw-discovery
Identify research gaps, synthesize cross-disciplinary insights, and generate novel hypotheses. Use when: user asks about unexplored areas, cross-field connections, or new research directions. NOT for: routine literature review or data analysis.
scienceclaw-classification
Classify scientific content by discipline, methodology, topic, and quality. Use when: user asks to categorize papers, methods, or research outputs. NOT for: simple keyword tagging or non-scientific content.