notebook-writer

Create and document Jupyter notebooks for reproducible analyses

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

notebook-writer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Create and document Jupyter notebooks for reproducible analyses

Teams using notebook-writer should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/notebook-writer/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/data-ai/notebook-writer/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/notebook-writer/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How notebook-writer Compares

Feature / Agent	notebook-writer	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Create and document Jupyter notebooks for reproducible analyses

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Notebook Writer Skill

You are a specialist in creating well-structured Jupyter notebooks for scientific analyses and documentation.

## When to Use This Skill

Use this skill when:
- Creating parameter sweeps or sensitivity analyses
- Documenting calculations with reproducible code
- Generating analysis reports that combine code, results, and interpretation
- Packaging agent work (Calculator, Researcher) into shareable notebooks

## Notebook Format: Jupytext Markdown

We use **Jupytext-compatible Markdown** for notebooks to enable git-friendly version control.

### Cell Markers

- **Markdown cells**: Regular Markdown text (no special marker)
- **Code cells**: Start with `# %%` on its own line

### Example Structure

```markdown
---
jupyter:
  kernelspec:
    display_name: Python 3
    language: python
    name: python3
---

# Analysis Title

Brief description of what this notebook does.

## Section 1: Data Loading

# %%
import pandas as pd
import numpy as np

# %%
data = pd.read_csv('data.csv')
data.head()

## Section 2: Analysis

Explanation of the analysis approach.

# %%
# Perform calculation
result = np.mean(data['value'])
print(f"Mean: {result:.2f}")
```

## Python Utility API

Many projects provide `src/utils/notebook_builder.py` with helper functions for programmatic notebook creation.

### Core Function: create_notebook_markdown

```python
create_notebook_markdown(
    title: str,
    cells: List[Dict[str, str]],
    output_path: Path,
    kernelspec: Optional[Dict] = None
) -> Path
```

**Parameters:**
- `title`: Notebook title (becomes H1 header)
- `cells`: List of dicts with `'type'` (`'code'` or `'markdown'`) and `'content'`
- `output_path`: Where to save `.md` file
- `kernelspec`: Optional kernel specification (defaults to Python 3)

**Example:**
```python
from pathlib import Path
from src.utils.notebook_builder import create_notebook_markdown

cells = [
    {'type': 'markdown', 'content': '## Introduction\n\nThis analysis...'},
    {'type': 'code', 'content': 'import numpy as np'},
    {'type': 'code', 'content': 'x = np.linspace(0, 10)\nprint(x)'}
]

create_notebook_markdown(
    title="My Analysis",
    cells=cells,
    output_path=Path('docs/analysis/my_analysis.md')
)
```

### Template: Parameter Sweep

```python
create_parameter_sweep_notebook(
    param_name: str,
    param_range: str,
    calculation_code: str,
    output_path: Path
) -> Path
```

Creates a notebook with:
- Imports (numpy, matplotlib, pandas)
- Parameter range definition
- Your calculation code
- Visualization boilerplate

**Example:**
```python
from pathlib import Path
from src.utils.notebook_builder import create_parameter_sweep_notebook

create_parameter_sweep_notebook(
    param_name='temperature',
    param_range='np.linspace(20, 40, 20)',
    calculation_code='''
# Reaction rate calculation
results = []
for T in temperature_values:
    rate = arrhenius_equation(T, activation_energy)
    results.append(rate)
''',
    output_path=Path('analysis/temperature_sweep.md')
)
```

### Template: Analysis Report

```python
create_analysis_report_notebook(
    analysis_title: str,
    sections: List[Dict[str, str]],
    output_path: Path
) -> Path
```

**Section dict keys:**
- `title`: Section heading (required)
- `description`: Explanatory text (optional)
- `code`: Code to execute (optional)
- `interpretation`: Results interpretation (optional)

**Example:**
```python
from src.utils.notebook_builder import create_analysis_report_notebook

sections = [
    {
        'title': 'Model Setup',
        'description': 'Define parameters',
        'code': 'diffusion_coeff = 2.1e-5  # cm²/s'
    },
    {
        'title': 'Calculation',
        'code': 'result = compute_model(diffusion_coeff)',
        'interpretation': 'Result shows X is dominated by Y'
    }
]

create_analysis_report_notebook(
    'Transport Analysis',
    sections,
    Path('analysis/transport.md')
)
```

### Validation Function

```python
validate_notebook(notebook_path: Path) -> bool
```

Validates `.ipynb` structure using nbformat. Returns `True` if valid, raises exception if invalid.

**Example:**
```python
from pathlib import Path
from src.utils.notebook_builder import validate_notebook

validate_notebook(Path('analysis/notebook.ipynb'))
# Returns True or raises ValidationError
```

## Workflow

1. **Create notebook** using utility or manual Markdown
2. **Edit `.md` file** directly (agents write Markdown well)
3. **Convert to `.ipynb`**: `python3 -m jupytext --to ipynb notebook.md`
4. **Run in Jupyter**: `jupyter notebook notebook.ipynb`
5. **Sync changes back**: `python3 -m jupytext --sync notebook.ipynb` (bidirectional)

## Jupyter AI Integration

Modern Jupyter environments (JupyterLab 4.0+, JetBrains IDEs) provide AI-powered assistance to enhance productivity and reduce errors.

### %%ai Magic Commands

The `%%ai` cell magic enables AI-powered code generation and analysis directly in notebooks:

```python
# %%
# %load_ext jupyter_ai_magics

# %%
%%ai chatgpt
Generate a function to calculate the Pearson correlation coefficient between two arrays
```

**Key use cases:**
- **Code generation**: Generate boilerplate code, data transformations, or analysis functions
- **Data exploration**: Ask questions about DataFrames or arrays
- **Debugging assistance**: Get suggestions for fixing errors
- **Documentation**: Generate docstrings or explanations

### Providing Context for Better Results

AI assistants work best when given relevant context. **Always provide**:

1. **API documentation**: For specialized libraries (scanpy, pydeseq2, biopython)
   ```python
   # Include relevant API documentation in a markdown cell
   # Example: scanpy.pp.filter_cells(data, min_genes=200)
   ```

2. **Dataset descriptions**: Shape, columns, data types
   ```python
   # Document your data structure:
   # RNA-seq counts matrix: 20,000 genes × 5,000 cells
   # AnnData object: .X (sparse CSR matrix), .obs (cell metadata), .var (gene metadata)
   ```

3. **Domain context**: Biological meaning, expected ranges, units
   ```python
   # Oxygen consumption rate: 10-20 pmol/s/million cells
   # Temperature: 37°C, pH: 7.4
   ```

### Chat UI Assistance

JupyterLab's chat interface provides conversational help:

**Best practices:**
- Use for exploratory questions: "What's the best way to normalize this data?"
- Ask for code review: "Does this analysis handle missing values correctly?"
- Request visualizations: "Create a heatmap of the top 50 variable genes"
- Get explanations: "Explain what this cell is doing"

### When to Use AI Assistance vs. Manual Coding

**Use AI assistance for:**
- Boilerplate code (imports, data loading templates)
- Exploratory analysis (quick plots, summary statistics)
- Learning new library syntax
- Generating test data or examples

**Write code manually for:**
- Core analysis logic (hypothesis testing, modeling)
- Publication-quality figures (fine-grained control needed)
- Performance-critical sections (AI-generated code may not be optimal)
- Complex domain-specific algorithms

**Warning**: Always verify AI-generated code. Check for:
- Correct library syntax (APIs change frequently)
- Appropriate statistical methods (AI may suggest invalid tests)
- Proper handling of biological data (species, units, measurement context)

### JetBrains AI Assistant

For notebooks in PyCharm/DataSpell:

**Features:**
- **Explain cell**: Understand what code does (Alt+Enter → "Explain")
- **Create visualization**: Generate plots from data descriptions
- **Edit cell**: Refactor or improve code (Alt+Enter → "AI Actions")
- **Fix errors**: Get suggestions for runtime errors

**Access**: Right-click cell → "AI Assistant" or use AI chat sidebar

## Jupytext Configuration

Projects should include `.jupytext.toml` in repository root:

```toml
# Jupytext configuration
# Enables git-friendly notebook version control

# Pair markdown and ipynb files
# Use myst format which supports # %% cell markers
formats = "md:myst,ipynb"
```

This tells Jupytext to:
- Recognize `.md` files as notebooks
- Use MyST Markdown format (supports `# %%` markers)
- Auto-sync with `.ipynb` when either is modified

## Git Tracking Strategy

**Recommended `.gitignore` configuration:**
```gitignore
# Track .md notebooks (Jupytext source), ignore generated .ipynb files
*.ipynb
.ipynb_checkpoints/
```

**What's tracked:**
- ✅ `.md` notebook files (human-readable source)
- ❌ `.ipynb` files (generated, binary JSON)
- ❌ `.ipynb_checkpoints/` (Jupyter temp files)

**Rationale:** `.md` files produce readable git diffs. `.ipynb` files are JSON with embedded outputs and can be regenerated from `.md`.

## Common Operations

### Create from scratch (manual)
1. Write Markdown file with `# %%` markers
2. Add YAML frontmatter (kernel info)
3. Convert: `python3 -m jupytext --to ipynb file.md`

### Convert existing notebook to Markdown
```bash
python3 -m jupytext --to md:myst notebook.ipynb
```

### Edit existing notebook
**Option 1: Edit .md file directly** (recommended for agents)
```bash
# Edit notebook.md in text editor
# Then convert:
python3 -m jupytext --to ipynb notebook.md
```

**Option 2: Edit in Jupyter, sync back**
```bash
jupyter notebook notebook.ipynb
# Make changes in Jupyter
# Sync back to .md:
python3 -m jupytext --sync notebook.ipynb
```

### Validate structure
```bash
python3 -c "
from pathlib import Path
from src.utils.notebook_builder import validate_notebook
validate_notebook(Path('notebook.ipynb'))
print('✓ Valid')
"
```

### Convert multiple notebooks
```bash
# Convert all .md notebooks in a directory
python3 -m jupytext --to ipynb analysis/*.md

# Or sync all paired notebooks
python3 -m jupytext --sync analysis/*.ipynb
```

## Best Practices

1. **Title every notebook** with clear purpose
2. **Start with imports** in first code cell
3. **Explain calculations** with markdown cells before code
4. **Interpret results** with markdown cells after code
5. **Use meaningful variable names** (not x, y, z)
6. **Include units** in comments and axis labels
7. **Save outputs** (figures) to files for documentation

## Reproducibility Standards

Scientific notebooks must be fully reproducible. Every notebook should enable another researcher to:
1. Recreate your computational environment
2. Rerun your analysis and get identical results
3. Understand your data sources and processing steps

### Environment Documentation

**Every notebook must include an environment documentation cell:**

```python
# %%
# Environment Information
# Run: pip freeze > requirements.txt
# Or: conda env export > environment.yml

import sys
import numpy as np
import pandas as pd
import scanpy as sc  # Example for single-cell analysis

print(f"Python: {sys.version}")
print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
print(f"Scanpy: {sc.__version__}")

# Include this output in your notebook for documentation
```

**Create environment files:**

```bash
# For pip users:
pip freeze > requirements.txt

# For conda users:
conda env export > environment.yml

# Include these files in your repository
```

**Document kernel selection:**
```markdown
## Computational Environment

- **Kernel**: Python 3.11 (project-env)
- **Dependencies**: See `requirements.txt` for full package list
- **Critical packages**: scanpy==1.10.0, numpy==1.26.3, pandas==2.2.0
```

### Random Seed Setting

**For any stochastic process, set random seeds:**

```python
# %%
# Set random seeds for reproducibility
import numpy as np
import random

RANDOM_SEED = 42  # Document why this value was chosen (convention, previous analysis, etc.)

np.random.seed(RANDOM_SEED)
random.seed(RANDOM_SEED)

# For machine learning:
import torch
torch.manual_seed(RANDOM_SEED)

# For scanpy:
import scanpy as sc
sc.settings.seed = RANDOM_SEED

print(f"Random seed set to {RANDOM_SEED}")
```

**Stochastic processes requiring seeds:**
- UMAP, t-SNE (dimensionality reduction)
- Random forest, neural networks (machine learning)
- Monte Carlo simulations
- Random sampling or bootstrapping
- Graph algorithms with random initialization

### Session Info Output

**End every notebook with a session info cell:**

```python
# %%
# Session Information (for reproducibility)
import session_info

session_info.show(
    dependencies=True,
    html=False
)

# Alternative for single-cell analysis:
# import scanpy as sc
# sc.logging.print_versions()
```

This captures:
- Python version
- Operating system
- Package versions (all dependencies)
- Execution timestamp

### File Path Best Practices

**Use relative paths and variables:**

```python
# %%
from pathlib import Path

# Define paths at the top of the notebook
DATA_DIR = Path("data/raw")
RESULTS_DIR = Path("results/analysis_2025-01-29")

# Ensure output directories exist
RESULTS_DIR.mkdir(parents=True, exist_ok=True)

# Use variables throughout
input_file = DATA_DIR / "counts.csv"
output_file = RESULTS_DIR / "normalized_counts.csv"
```

**Never use hardcoded absolute paths:**
```python
# BAD:
data = pd.read_csv("/Users/yourname/project/data.csv")

# GOOD:
data = pd.read_csv(DATA_DIR / "data.csv")
```

### Reproducibility Checklist

Before sharing or archiving a notebook:
- [ ] Environment documented (Python version, key package versions)
- [ ] `requirements.txt` or `environment.yml` exists and is current
- [ ] Random seeds set for all stochastic processes
- [ ] Session info cell at end of notebook
- [ ] File paths use variables (not hardcoded)
- [ ] Data sources documented (where to download, version, date)
- [ ] Notebook runs end-to-end without errors (Restart & Run All)
- [ ] Results match expected output (if re-running existing analysis)

**Integration with other skills:**
- **notebook-debugger**: Use to verify end-to-end execution
- **bioinformatician**: Apply reproducibility standards to all computational biology analyses
- **copilot**: Review notebooks for reproducibility compliance

## Project-Specific Usage

Many projects have a `docs/NOTEBOOK-WORKFLOW.md` or similar document with project-specific examples and patterns. Check your project's documentation for:
- Domain-specific notebook templates
- Agent integration patterns (which skills create notebooks)
- Directory structure conventions (where to save notebooks)
- Project-specific best practices

## Troubleshooting

### Issue: Jupytext can't find format

**Error:** `Format 'percent' is not associated to extension '.md'`

**Fix:** Use `md:myst` format in `.jupytext.toml` (not `md:percent`). MyST Markdown supports `# %%` markers.

```toml
formats = "md:myst,ipynb"  # Correct
```

### Issue: Sync not working

**Symptom:** Changes to `.ipynb` don't appear in `.md`

**Solution:**
1. Check `.jupytext.toml` exists and has correct format
2. Run sync explicitly: `python3 -m jupytext --sync notebook.ipynb`
3. Check both files exist (create `.md` first if needed)

### Issue: Validation fails

**Error:** `nbformat.ValidationError`

**Causes:**
- Missing cell IDs (required in nbformat v4.5+)
- Invalid JSON structure
- Missing required fields

**Solution:** Use `notebook_builder.py` utility functions which handle validation automatically.

### Issue: Git shows .ipynb files

**Symptom:** `.ipynb` files appearing in `git status`

**Fix:** Ensure `.gitignore` contains `*.ipynb`. Check with:
```bash
git check-ignore -v notebook.ipynb
```

## Error Prevention

### Common Issues

1. **Missing `# %%`**: Code cells must start with this marker
2. **Frontmatter syntax**: YAML header must be exact (see example structure above)
3. **Path handling**: Use `Path` objects, ensure directories exist
4. **Cell validation**: Use `notebook_builder.validate_notebook()` after creation

### Validation Checklist

Before finalizing a notebook:
- [ ] Frontmatter present with kernelspec
- [ ] All code cells have `# %%` marker
- [ ] Imports in first code cell
- [ ] Results interpreted with markdown
- [ ] Saved to appropriate location
- [ ] Validated with nbformat if creating .ipynb directly

## Dependencies

**Required packages:**
```bash
pip3 install jupytext nbformat
```

**Check installation:**
```bash
pip3 list | grep -E "(jupytext|nbformat)"
```

**Tested versions:**
- jupytext: 1.19+
- nbformat: 5.10+
- Python: 3.9+

## Integration with Other Skills

Common patterns for skill integration:
- **Quantitative analysis skills**: Package calculations as reproducible notebooks with parameter sweeps
- **Research skills**: Document literature-derived parameters with citations in data notebooks
- **Planning skills**: Generate protocol notebooks with expected results and analysis templates
- **Review skills**: Check notebook code for correctness and best practices

---

Remember: Notebooks are for **interactive exploration** and **reproducible documentation**. For production code, use Python modules in `src/`.

**For project-specific examples and patterns**, see your project's documentation (often `docs/NOTEBOOK-WORKFLOW.md` or similar).

Related Skills

opened-daily-newsletter-writer

from diegosouzapw/awesome-omni-skill

Creates Monday-Thursday OpenEd Daily newsletters (500-800 words) with Thought-Trend-Tool structure. Use when the user asks to create a daily newsletter, write daily content, or transform source material into newsletter segments. Not for Friday Weekly digests.

content-research-writer

from diegosouzapw/awesome-omni-skill

Assists in writing high-quality content by conducting research, adding citations, improving hooks, iterating on outlines, and providing real-time feedback on each section. Transforms your writing process from solo effort to collaborative partnership.

Ad Copy Writer

from diegosouzapw/awesome-omni-skill

Write high-converting advertising copy for paid media campaigns

writer

from diegosouzapw/awesome-omni-skill

Document creation, format conversion (ODT/DOCX/PDF), mail merge, and automation with LibreOffice Writer.

tone-rewriter

from diegosouzapw/awesome-omni-skill

Rewrite text in any of 10 tones (professional, casual, friendly, formal, empathetic, persuasive, academic, simple, witty, urgent) while preserving meaning. x402 pay-per-use: $0.01 USDC. Use when: tone adjustment, rewrite text, change tone, professional rewrite, casual rewrite, make friendly, formalize text.

cs-guide-writer

from diegosouzapw/awesome-omni-skill

CS 학습 문서를 작성합니다. "오늘의 CS", "CS 정리", "{주제} 정리해줘", "최근 이슈 CS" 요청 시 사용하세요.

api-tutorial-writer

from diegosouzapw/awesome-omni-skill

Эксперт по написанию API-туториалов и документации. Используй для создания гайдов по интеграции API, документации endpoints, примеров кода на разных языках, обработки ошибок и best practices.

api-documentation-writer

from diegosouzapw/awesome-omni-skill

Expert guide for writing comprehensive API documentation including OpenAPI specs, endpoint references, authentication guides, and code examples. Use when documenting APIs, creating developer portals, or improving API discoverability.

adr-writer

from diegosouzapw/awesome-omni-skill

Creates Architecture Decision Records documenting key technical decisions with context, alternatives considered, tradeoffs, consequences, and decision owners. Use when documenting "architecture decisions", "technical choices", "design decisions", or "ADRs".

trae-rules-writer

from diegosouzapw/awesome-omni-skill

Create Trae IDE rules (.trae/rules/*.md) for AI behavior constraints. Use when user wants to: create a project rule, set up code style guidelines, enforce naming conventions, make AI always do X, or customize AI behavior for specific files. Triggers on: '创建 rule', 'project rule', '.trae/rules/', 'AGENTS.md', 'CLAUDE.md', 'make AI always use PascalCase'. Do NOT use for skills (use trae-skill-writer) or agents (use trae-agent-writer).

system-prompt-writer

from diegosouzapw/awesome-omni-skill

This skill should be used when writing or improving system prompts for AI agents, providing expert guidance based on Anthropic's context engineering principles.

structured-prompt-writer

from diegosouzapw/awesome-omni-skill

结构化AI提示词写作工具，内置395+提示词模板。支持详细模式和简单模式。用于创建专业的AI角色提示词、系统提示词或任务提示词。当用户需要：(1) 创建新的AI提示词/Prompt (2) 设计AI角色/Persona (3) 编写系统提示词 (4) 优化现有提示词结构时使用此技能。