biopython-advanced
Advanced Biopython modules for motifs, population genetics, sequence utilities, restriction analysis, clustering, and GenomeDiagram visualization; use when you need extended bioinformatics analysis beyond basic sequence I/O and alignment.
Best use case
biopython-advanced is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Advanced Biopython modules for motifs, population genetics, sequence utilities, restriction analysis, clustering, and GenomeDiagram visualization; use when you need extended bioinformatics analysis beyond basic sequence I/O and alignment.
Teams using biopython-advanced should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/biopython-advanced/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How biopython-advanced Compares
| Feature / Agent | biopython-advanced | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Advanced Biopython modules for motifs, population genetics, sequence utilities, restriction analysis, clustering, and GenomeDiagram visualization; use when you need extended bioinformatics analysis beyond basic sequence I/O and alignment.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
# biopython-advanced
## When to Use
- You need **motif discovery/statistics** (e.g., PWM/consensus, motif counts across multiple sequences).
- You want **restriction enzyme site analysis** (e.g., find cut sites for specific enzymes in a DNA sequence).
- You need **codon usage / sequence utility calculations** (e.g., codon frequency from CDS, GC content, basic sequence stats).
- You are working with **population genetics (PopGen)** utilities for advanced analyses.
- You need **advanced visualization** such as **GenomeDiagram**-style plots for genomic features.
## Key Features
- **Motif analysis** using Biopython’s `Bio.motifs` (counts, consensus, simple statistics).
- **Restriction analysis** using `Bio.Restriction` (enzyme lookup, cut site detection).
- **Sequence utilities** via `Bio.SeqUtils` (codon usage and related helpers).
- Access to additional advanced tools such as **CodonTable**, **SeqFeature**, and **IUPACData** when needed.
- Standardized workflow conventions:
- Write configuration to `config/task_config.json` as an intermediate artifact.
- Run tasks uniformly via `python scripts/<task_name>.py`.
- Avoid stacking many CLI flags; keep parameters in config files.
- Always use `encoding="utf-8"` for file I/O; JSON output uses `ensure_ascii=False`.
## Dependencies
Required:
- biopython (>=1.80)
- numpy (>=1.21)
Optional (for reporting/plotting):
- reportlab (>=3.6)
- matplotlib (>=3.5)
## Example Usage
The following examples are complete runnable scripts that follow the conventions:
- configuration stored in `config/task_config.json`
- invoked as `python scripts/<task_name>.py`
- explicit UTF-8 encoding and `ensure_ascii=False` for JSON output
### 1) Motif Statistics
**config/task_config.json**
```json
{
"task": "motif_stats",
"sequences": ["ATGCATGCATGC", "ATGCGTGCATGC", "ATGCATGTATGC"]
}
```
**scripts/motif_stats.py**
```python
import json
from Bio import motifs
from Bio.Seq import Seq
def main():
with open("config/task_config.json", "r", encoding="utf-8") as f:
cfg = json.load(f)
seqs = [Seq(s) for s in cfg["sequences"]]
m = motifs.create(seqs)
result = {
"alphabet": str(m.alphabet),
"length": m.length,
"counts": {k: dict(v) for k, v in m.counts.items()},
"consensus": str(m.consensus),
"degenerate_consensus": str(m.degenerate_consensus),
}
with open("outputs/motif_stats.json", "w", encoding="utf-8") as f:
json.dump(result, f, ensure_ascii=False, indent=2)
if __name__ == "__main__":
main()
```
Run:
```bash
python scripts/motif_stats.py
```
### 2) Restriction Enzyme Cleavage Sites
**config/task_config.json**
```json
{
"task": "restriction_sites",
"sequence": "GAATTCGCGGAATTC",
"enzymes": ["EcoRI", "BamHI"]
}
```
**scripts/restriction_sites.py**
```python
import json
from Bio.Seq import Seq
from Bio.Restriction import RestrictionBatch
def main():
with open("config/task_config.json", "r", encoding="utf-8") as f:
cfg = json.load(f)
seq = Seq(cfg["sequence"])
batch = RestrictionBatch(cfg["enzymes"])
analysis = batch.search(seq)
# Convert enzyme keys to strings for JSON serialization
result = {str(enzyme): positions for enzyme, positions in analysis.items()}
with open("outputs/restriction_sites.json", "w", encoding="utf-8") as f:
json.dump(result, f, ensure_ascii=False, indent=2)
if __name__ == "__main__":
main()
```
Run:
```bash
python scripts/restriction_sites.py
```
### 3) Codon Usage Frequency (CDS)
**config/task_config.json**
```json
{
"task": "codon_usage",
"cds": "ATGGCTGCTGCTGCTTAA"
}
```
**scripts/codon_usage.py**
```python
import json
from collections import Counter
def main():
with open("config/task_config.json", "r", encoding="utf-8") as f:
cfg = json.load(f)
cds = cfg["cds"].upper().replace(" ", "").replace("\n", "")
codons = [cds[i:i+3] for i in range(0, len(cds) - (len(cds) % 3), 3)]
counts = Counter(codons)
total = sum(counts.values()) or 1
result = {
"total_codons": total,
"codon_counts": dict(sorted(counts.items())),
"codon_frequencies": {k: v / total for k, v in sorted(counts.items())},
"note": "This example computes raw codon frequencies from the provided CDS. Validate CDS frame and stop codons for your use case."
}
with open("outputs/codon_usage.json", "w", encoding="utf-8") as f:
json.dump(result, f, ensure_ascii=False, indent=2)
if __name__ == "__main__":
main()
```
Run:
```bash
python scripts/codon_usage.py
```
## Implementation Details
- **Configuration-first execution**
- All task parameters are stored in `config/task_config.json` to keep CLI invocation stable and reproducible.
- Scripts read the config as the single source of truth and write results to `outputs/*.json`.
- **Motif statistics (`Bio.motifs`)**
- A motif is created from aligned sequences of equal length.
- Outputs typically include:
- `counts`: per-position nucleotide counts
- `consensus` and `degenerate_consensus`: derived consensus sequences
- If sequences differ in length, you must align/trim/pad them before motif creation.
- **Restriction analysis (`Bio.Restriction`)**
- `RestrictionBatch(enzymes).search(seq)` returns cut positions per enzyme.
- Enzyme objects are converted to strings for JSON serialization.
- **Codon usage**
- The example computes codon frequencies by splitting the CDS into triplets in-frame.
- Practical considerations:
- Ensure the CDS length is a multiple of 3 (or decide how to handle remainder bases).
- Confirm the correct reading frame and whether to include terminal stop codons.
- For organism-specific codon usage tables, integrate `Bio.Data.CodonTable` as needed.
- **I/O requirements**
- Always open files with `encoding="utf-8"`.
- Use `json.dump(..., ensure_ascii=False)` to preserve non-ASCII characters in outputs.
- **Further reference**
- See `references/advanced.md` for additional notes and module coverage (motifs/PopGen/SeqUtils/Restriction/Cluster, GenomeDiagram, CodonTable/SeqFeature/IUPACData).Related Skills
biopython-entrez
Use Bio.Entrez to access NCBI databases (e.g., PubMed/GenBank) for searching, fetching summaries, and downloading records when your workflow needs to call the NCBI E-utilities API over the network.
biopython
A comprehensive toolbox for computational molecular biology; use it when you need programmatic sequence/structure parsing, batch bioinformatics pipelines, or automated NCBI/BLAST workflows.
biopython-structure
Use Bio.PDB to parse and analyze protein structures (PDB/mmCIF) for structural bioinformatics tasks; use when you need structure parsing, geometry calculations, or structural comparison/superposition.
biopython-sequence-io
Use Biopython to read/write/convert biological sequence files (FASTA/GenBank/FASTQ, etc.) and perform basic sequence operations; use when you need reliable sequence I/O, lightweight sequence manipulation, or scalable processing of large sequence datasets.
biopython-phylo
Use Bio.Phylo to read/write phylogenetic trees and perform visualization and statistics; use when tree parsing/conversion, pruning/rerooting, distance calculation, or plotting is required.
biopython-alignment
Sequence alignment and alignment file processing with Biopython (Bio.Align/Bio.AlignIO), triggered when you need global/local pairwise alignment, MSA read/write/format conversion, or alignment statistics/filtering.
skill-auditor
A comprehensive auditor for any agent skill — including Manus, OpenClaw/ClawHub, Claude, LobeHub, or custom SKILL.md-based skills. Use this skill whenever a user wants to evaluate, audit, review, score, or quality-check an agent skill before publishing, updating, or deploying. Covers two hard veto gates (structural redlines + research integrity redlines), static quality scoring across 25 criteria (ISO 25010 + OpenSSF + Agent), dynamic test input generation, multi-mode execution testing, multi-layer output evaluation with five specialized category rubrics (Evidence Insight / Protocol Design / Data Analysis / Academic Writing / Other), a Research Veto that applies to all four research categories, human eval viewer generation, actionable P0/P1/P2 optimization recommendations, and automatic skill improvement that outputs a polished, production-ready SKILL.md. Also use whenever a user says "audit my skill", "evaluate my skill", "improve my skill", or wants a corrected version after evaluation.
two-sample-mr-research-planner
Generates complete two-sample Mendelian randomization (MR) research designs from a user-provided research direction. Use when users want to design, plan, or build a study using two-sample MR to test causal relationships. Triggers:"design a two-sample MR study", "build a publishable MR paper", "test whether this biomarker causally affects this disease", "generate Lite/Standard/Advanced MR plans", "screen multiple exposures with MR", "bidirectional MR design", "causal inference using GWAS summary statistics", or "I want to study X and Y using MR". Always outputs four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.
research-proposal-generator
Generates a comprehensive research proposal design based on input literature, including hypothesis, mechanism verification, and budget. Use when the user wants to design a research project from a paper.
research-grants
Write competitive research proposals for NSF, NIH, DOE, DARPA, and Taiwan's NSTC when you need agency-compliant narratives, budgets, and review-criteria alignment for a specific solicitation/FOA/BAA.
protocol-standardization
Standardize fragmented experimental steps into reproducible protocol documents when you need method organization, lab SOP drafting, or cross-operator reproducibility; missing parameters must be explicitly marked as "To be supplemented/Not provided".
prospero-registration-helper
Assists researchers in generating PROSPERO registration content for meta-analyses from a title and optional protocol. Use when the user wants to draft a PROSPERO registration form.