biopython-advanced

Advanced Biopython modules for motifs, population genetics, sequence utilities, restriction analysis, clustering, and GenomeDiagram visualization; use when you need extended bioinformatics analysis beyond basic sequence I/O and alignment.

53 stars

byaipoch

View on GitHub Installation ↓

Best use case

biopython-advanced is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using biopython-advanced should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/biopython-advanced/SKILL.md --create-dirs "https://raw.githubusercontent.com/aipoch/medical-research-skills/main/scientific-skills/Data Analysis/biopython-advanced/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/biopython-advanced/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How biopython-advanced Compares

Feature / Agent	biopython-advanced	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

# biopython-advanced

## When to Use

- You need **motif discovery/statistics** (e.g., PWM/consensus, motif counts across multiple sequences).
- You want **restriction enzyme site analysis** (e.g., find cut sites for specific enzymes in a DNA sequence).
- You need **codon usage / sequence utility calculations** (e.g., codon frequency from CDS, GC content, basic sequence stats).
- You are working with **population genetics (PopGen)** utilities for advanced analyses.
- You need **advanced visualization** such as **GenomeDiagram**-style plots for genomic features.

## Key Features

- **Motif analysis** using Biopython’s `Bio.motifs` (counts, consensus, simple statistics).
- **Restriction analysis** using `Bio.Restriction` (enzyme lookup, cut site detection).
- **Sequence utilities** via `Bio.SeqUtils` (codon usage and related helpers).
- Access to additional advanced tools such as **CodonTable**, **SeqFeature**, and **IUPACData** when needed.
- Standardized workflow conventions:
  - Write configuration to `config/task_config.json` as an intermediate artifact.
  - Run tasks uniformly via `python scripts/<task_name>.py`.
  - Avoid stacking many CLI flags; keep parameters in config files.
  - Always use `encoding="utf-8"` for file I/O; JSON output uses `ensure_ascii=False`.

## Dependencies

Required:

- biopython (>=1.80)
- numpy (>=1.21)

Optional (for reporting/plotting):

- reportlab (>=3.6)
- matplotlib (>=3.5)

## Example Usage

The following examples are complete runnable scripts that follow the conventions:
- configuration stored in `config/task_config.json`
- invoked as `python scripts/<task_name>.py`
- explicit UTF-8 encoding and `ensure_ascii=False` for JSON output

### 1) Motif Statistics

**config/task_config.json**
```json
{
  "task": "motif_stats",
  "sequences": ["ATGCATGCATGC", "ATGCGTGCATGC", "ATGCATGTATGC"]
}
```

**scripts/motif_stats.py**
```python
import json
from Bio import motifs
from Bio.Seq import Seq

def main():
    with open("config/task_config.json", "r", encoding="utf-8") as f:
        cfg = json.load(f)

    seqs = [Seq(s) for s in cfg["sequences"]]
    m = motifs.create(seqs)

    result = {
        "alphabet": str(m.alphabet),
        "length": m.length,
        "counts": {k: dict(v) for k, v in m.counts.items()},
        "consensus": str(m.consensus),
        "degenerate_consensus": str(m.degenerate_consensus),
    }

    with open("outputs/motif_stats.json", "w", encoding="utf-8") as f:
        json.dump(result, f, ensure_ascii=False, indent=2)

if __name__ == "__main__":
    main()
```

Run:
```bash
python scripts/motif_stats.py
```

### 2) Restriction Enzyme Cleavage Sites

**config/task_config.json**
```json
{
  "task": "restriction_sites",
  "sequence": "GAATTCGCGGAATTC",
  "enzymes": ["EcoRI", "BamHI"]
}
```

**scripts/restriction_sites.py**
```python
import json
from Bio.Seq import Seq
from Bio.Restriction import RestrictionBatch

def main():
    with open("config/task_config.json", "r", encoding="utf-8") as f:
        cfg = json.load(f)

    seq = Seq(cfg["sequence"])
    batch = RestrictionBatch(cfg["enzymes"])
    analysis = batch.search(seq)

    # Convert enzyme keys to strings for JSON serialization
    result = {str(enzyme): positions for enzyme, positions in analysis.items()}

    with open("outputs/restriction_sites.json", "w", encoding="utf-8") as f:
        json.dump(result, f, ensure_ascii=False, indent=2)

if __name__ == "__main__":
    main()
```

Run:
```bash
python scripts/restriction_sites.py
```

### 3) Codon Usage Frequency (CDS)

**config/task_config.json**
```json
{
  "task": "codon_usage",
  "cds": "ATGGCTGCTGCTGCTTAA"
}
```

**scripts/codon_usage.py**
```python
import json
from collections import Counter

def main():
    with open("config/task_config.json", "r", encoding="utf-8") as f:
        cfg = json.load(f)

    cds = cfg["cds"].upper().replace(" ", "").replace("\n", "")
    codons = [cds[i:i+3] for i in range(0, len(cds) - (len(cds) % 3), 3)]
    counts = Counter(codons)
    total = sum(counts.values()) or 1

    result = {
        "total_codons": total,
        "codon_counts": dict(sorted(counts.items())),
        "codon_frequencies": {k: v / total for k, v in sorted(counts.items())},
        "note": "This example computes raw codon frequencies from the provided CDS. Validate CDS frame and stop codons for your use case."
    }

    with open("outputs/codon_usage.json", "w", encoding="utf-8") as f:
        json.dump(result, f, ensure_ascii=False, indent=2)

if __name__ == "__main__":
    main()
```

Run:
```bash
python scripts/codon_usage.py
```

## Implementation Details

- **Configuration-first execution**
  - All task parameters are stored in `config/task_config.json` to keep CLI invocation stable and reproducible.
  - Scripts read the config as the single source of truth and write results to `outputs/*.json`.

- **Motif statistics (`Bio.motifs`)**
  - A motif is created from aligned sequences of equal length.
  - Outputs typically include:
    - `counts`: per-position nucleotide counts
    - `consensus` and `degenerate_consensus`: derived consensus sequences
  - If sequences differ in length, you must align/trim/pad them before motif creation.

- **Restriction analysis (`Bio.Restriction`)**
  - `RestrictionBatch(enzymes).search(seq)` returns cut positions per enzyme.
  - Enzyme objects are converted to strings for JSON serialization.

- **Codon usage**
  - The example computes codon frequencies by splitting the CDS into triplets in-frame.
  - Practical considerations:
    - Ensure the CDS length is a multiple of 3 (or decide how to handle remainder bases).
    - Confirm the correct reading frame and whether to include terminal stop codons.
    - For organism-specific codon usage tables, integrate `Bio.Data.CodonTable` as needed.

- **I/O requirements**
  - Always open files with `encoding="utf-8"`.
  - Use `json.dump(..., ensure_ascii=False)` to preserve non-ASCII characters in outputs.

- **Further reference**
  - See `references/advanced.md` for additional notes and module coverage (motifs/PopGen/SeqUtils/Restriction/Cluster, GenomeDiagram, CodonTable/SeqFeature/IUPACData).

Related Skills

biopython-entrez

from aipoch/medical-research-skills

Use Bio.Entrez to access NCBI databases (e.g., PubMed/GenBank) for searching, fetching summaries, and downloading records when your workflow needs to call the NCBI E-utilities API over the network.

biopython

from aipoch/medical-research-skills

A comprehensive toolbox for computational molecular biology; use it when you need programmatic sequence/structure parsing, batch bioinformatics pipelines, or automated NCBI/BLAST workflows.

biopython-structure

from aipoch/medical-research-skills

Use Bio.PDB to parse and analyze protein structures (PDB/mmCIF) for structural bioinformatics tasks; use when you need structure parsing, geometry calculations, or structural comparison/superposition.

biopython-sequence-io

from aipoch/medical-research-skills

Use Biopython to read/write/convert biological sequence files (FASTA/GenBank/FASTQ, etc.) and perform basic sequence operations; use when you need reliable sequence I/O, lightweight sequence manipulation, or scalable processing of large sequence datasets.

biopython-phylo

from aipoch/medical-research-skills

Use Bio.Phylo to read/write phylogenetic trees and perform visualization and statistics; use when tree parsing/conversion, pruning/rerooting, distance calculation, or plotting is required.

biopython-alignment

from aipoch/medical-research-skills

Sequence alignment and alignment file processing with Biopython (Bio.Align/Bio.AlignIO), triggered when you need global/local pairwise alignment, MSA read/write/format conversion, or alignment statistics/filtering.

skill-auditor

from aipoch/medical-research-skills

A comprehensive auditor for any agent skill — including Manus, OpenClaw/ClawHub, Claude, LobeHub, or custom SKILL.md-based skills. Use this skill whenever a user wants to evaluate, audit, review, score, or quality-check an agent skill before publishing, updating, or deploying. Covers two hard veto gates (structural redlines + research integrity redlines), static quality scoring across 25 criteria (ISO 25010 + OpenSSF + Agent), dynamic test input generation, multi-mode execution testing, multi-layer output evaluation with five specialized category rubrics (Evidence Insight / Protocol Design / Data Analysis / Academic Writing / Other), a Research Veto that applies to all four research categories, human eval viewer generation, actionable P0/P1/P2 optimization recommendations, and automatic skill improvement that outputs a polished, production-ready SKILL.md. Also use whenever a user says "audit my skill", "evaluate my skill", "improve my skill", or wants a corrected version after evaluation.

two-sample-mr-research-planner

from aipoch/medical-research-skills

Generates complete two-sample Mendelian randomization (MR) research designs from a user-provided research direction. Use when users want to design, plan, or build a study using two-sample MR to test causal relationships. Triggers:"design a two-sample MR study", "build a publishable MR paper", "test whether this biomarker causally affects this disease", "generate Lite/Standard/Advanced MR plans", "screen multiple exposures with MR", "bidirectional MR design", "causal inference using GWAS summary statistics", or "I want to study X and Y using MR". Always outputs four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.

research-proposal-generator

from aipoch/medical-research-skills

Generates a comprehensive research proposal design based on input literature, including hypothesis, mechanism verification, and budget. Use when the user wants to design a research project from a paper.

research-grants

from aipoch/medical-research-skills

Write competitive research proposals for NSF, NIH, DOE, DARPA, and Taiwan's NSTC when you need agency-compliant narratives, budgets, and review-criteria alignment for a specific solicitation/FOA/BAA.

protocol-standardization

from aipoch/medical-research-skills

Standardize fragmented experimental steps into reproducible protocol documents when you need method organization, lab SOP drafting, or cross-operator reproducibility; missing parameters must be explicitly marked as "To be supplemented/Not provided".

prospero-registration-helper

from aipoch/medical-research-skills

Assists researchers in generating PROSPERO registration content for meta-analyses from a title and optional protocol. Use when the user wants to draft a PROSPERO registration form.