biopython-alignment
Sequence alignment and alignment file processing with Biopython (Bio.Align/Bio.AlignIO), triggered when you need global/local pairwise alignment, MSA read/write/format conversion, or alignment statistics/filtering.
Best use case
biopython-alignment is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Sequence alignment and alignment file processing with Biopython (Bio.Align/Bio.AlignIO), triggered when you need global/local pairwise alignment, MSA read/write/format conversion, or alignment statistics/filtering.
Teams using biopython-alignment should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/biopython-alignment/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How biopython-alignment Compares
| Feature / Agent | biopython-alignment | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Sequence alignment and alignment file processing with Biopython (Bio.Align/Bio.AlignIO), triggered when you need global/local pairwise alignment, MSA read/write/format conversion, or alignment statistics/filtering.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
# biopython-alignment
## When to Use
- You need **global alignment** between two protein (or nucleotide) sequences and want a reproducible score and aligned strings.
- You need **local alignment** to find the best matching fragment/subsequence between two DNA/RNA/protein sequences.
- You need to **read, write, or convert** multiple sequence alignment (MSA) files (e.g., FASTA/Clustal/Stockholm) using Biopython I/O.
- You want to compute **alignment statistics** (e.g., identity, coverage, conservation per column) and filter alignments by thresholds.
- You need to apply **substitution matrices** (e.g., BLOSUM62) and tune gap penalties for biologically meaningful scoring.
## Key Features
- Pairwise alignment via `Bio.Align.PairwiseAligner` (global and local modes).
- Alignment scoring with configurable match/mismatch and gap penalties.
- Protein substitution matrices via `Bio.Align.substitution_matrices` (e.g., BLOSUM/PAM).
- MSA parsing and serialization via `Bio.AlignIO` (read/write/format conversion).
- Basic alignment statistics: identity, aligned length, coverage, and MSA column conservation.
## Dependencies
- `biopython>=1.81`
- `numpy>=1.21`
## Example Usage
```python
# -*- coding: utf-8 -*-
"""
Runnable examples for:
1) Global protein alignment
2) Local DNA alignment (best fragment)
3) MSA parsing + column conservation
Requires: biopython, numpy
"""
from __future__ import annotations
from io import StringIO
import numpy as np
from Bio.Align import PairwiseAligner
from Bio.Align import substitution_matrices
from Bio import AlignIO
def global_protein_alignment(seq_a: str, seq_b: str) -> None:
matrix = substitution_matrices.load("BLOSUM62")
aligner = PairwiseAligner()
aligner.mode = "global"
aligner.substitution_matrix = matrix
aligner.open_gap_score = -10.0
aligner.extend_gap_score = -0.5
alignments = aligner.align(seq_a, seq_b)
best = alignments[0]
print("=== Global protein alignment (best) ===")
print("Score:", best.score)
print(best)
def local_dna_alignment_best_fragment(seq_a: str, seq_b: str) -> None:
aligner = PairwiseAligner()
aligner.mode = "local"
aligner.match_score = 2.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -2.0
aligner.extend_gap_score = -0.5
best = aligner.align(seq_a, seq_b)[0]
# Extract the aligned fragment coordinates from the first aligned block.
# aligned is a tuple: (aligned_coords_in_seq_a, aligned_coords_in_seq_b)
a_blocks, b_blocks = best.aligned
a_start, a_end = a_blocks[0]
b_start, b_end = b_blocks[0]
print("=== Local DNA alignment (best) ===")
print("Score:", best.score)
print(best)
print("Best fragment in seq_a:", seq_a[a_start:a_end], f"(coords {a_start}:{a_end})")
print("Best fragment in seq_b:", seq_b[b_start:b_end], f"(coords {b_start}:{b_end})")
def msa_column_conservation(fasta_text: str) -> None:
handle = StringIO(fasta_text)
msa = AlignIO.read(handle, "fasta") # MultipleSeqAlignment
# Convert to a 2D array of characters: shape (n_seqs, aln_len)
arr = np.array([list(str(rec.seq)) for rec in msa], dtype="U1")
n_seqs, aln_len = arr.shape
# Conservation per column: fraction of the most common non-gap character.
# Treat '-' as gap; ignore gaps when computing the most common residue.
conservation = []
for j in range(aln_len):
col = arr[:, j]
col = col[col != "-"]
if col.size == 0:
conservation.append(0.0)
continue
values, counts = np.unique(col, return_counts=True)
conservation.append(float(counts.max() / counts.sum()))
print("=== MSA column conservation ===")
print("n_seqs:", n_seqs, "aln_len:", aln_len)
print("conservation:", [round(x, 3) for x in conservation])
def main() -> None:
# 1) Global alignment (protein)
seq_a = "MKTAYIAKQRQISFVKSHFSRQDILD"
seq_b = "MKLAYIAKQRQISFVKSHFTRQDILN"
global_protein_alignment(seq_a, seq_b)
# 2) Local alignment (DNA)
seq_a = "ATGCGTACGTTAGC"
seq_b = "GGGATGCGTACGAAAC"
local_dna_alignment_best_fragment(seq_a, seq_b)
# 3) MSA conservation (FASTA)
fasta_text = ">s1\nACGTACGT\n>s2\nACGTTCGT\n>s3\nACGTACGA\n"
msa_column_conservation(fasta_text)
if __name__ == "__main__":
main()
```
## Implementation Details
- **Pairwise alignment engine**: uses `Bio.Align.PairwiseAligner`, which performs dynamic programming alignment under the selected mode:
- `mode="global"`: aligns full-length sequences end-to-end.
- `mode="local"`: finds the highest-scoring matching region (best subsequence pair).
- **Scoring configuration**:
- For proteins, prefer `substitution_matrix` (e.g., `BLOSUM62`) plus gap penalties (`open_gap_score`, `extend_gap_score`).
- For nucleotides, a simple scheme is common: `match_score`, `mismatch_score`, and gap penalties.
- **Selecting the best alignment**: `aligner.align(a, b)` returns an iterable of alignments sorted by score; use `[0]` for the top-scoring result.
- **Local “best fragment” extraction**:
- `alignment.aligned` returns aligned coordinate blocks for each sequence.
- The first block `(start, end)` typically corresponds to the highest-scoring contiguous aligned region; slice the original sequences with these coordinates to obtain the fragment.
- **MSA I/O and statistics**:
- `Bio.AlignIO.read(handle, fmt)` parses an alignment into a `MultipleSeqAlignment`.
- Column conservation can be computed as:
`max_count(non-gap residues in column) / total_non_gap_count(column)`.
- **Operational conventions (recommended)**:
- Store runtime configuration in `config/task_config.json` and invoke scripts as `python scripts/<task_name>.py`.
- Avoid stacking many CLI `--` parameters; keep parameters in the config file.
- Always specify `encoding="utf-8"` for file I/O; for JSON output use `ensure_ascii=False`.Related Skills
biopython-entrez
Use Bio.Entrez to access NCBI databases (e.g., PubMed/GenBank) for searching, fetching summaries, and downloading records when your workflow needs to call the NCBI E-utilities API over the network.
sequence-alignment
A skill for performing sequence alignment using NCBI BLAST API. Supports nucleotide and protein sequence comparison against major biological databases.
biopython
A comprehensive toolbox for computational molecular biology; use it when you need programmatic sequence/structure parsing, batch bioinformatics pipelines, or automated NCBI/BLAST workflows.
biopython-structure
Use Bio.PDB to parse and analyze protein structures (PDB/mmCIF) for structural bioinformatics tasks; use when you need structure parsing, geometry calculations, or structural comparison/superposition.
biopython-sequence-io
Use Biopython to read/write/convert biological sequence files (FASTA/GenBank/FASTQ, etc.) and perform basic sequence operations; use when you need reliable sequence I/O, lightweight sequence manipulation, or scalable processing of large sequence datasets.
biopython-phylo
Use Bio.Phylo to read/write phylogenetic trees and perform visualization and statistics; use when tree parsing/conversion, pruning/rerooting, distance calculation, or plotting is required.
biopython-advanced
Advanced Biopython modules for motifs, population genetics, sequence utilities, restriction analysis, clustering, and GenomeDiagram visualization; use when you need extended bioinformatics analysis beyond basic sequence I/O and alignment.
skill-auditor
A comprehensive auditor for any agent skill — including Manus, OpenClaw/ClawHub, Claude, LobeHub, or custom SKILL.md-based skills. Use this skill whenever a user wants to evaluate, audit, review, score, or quality-check an agent skill before publishing, updating, or deploying. Covers two hard veto gates (structural redlines + research integrity redlines), static quality scoring across 25 criteria (ISO 25010 + OpenSSF + Agent), dynamic test input generation, multi-mode execution testing, multi-layer output evaluation with five specialized category rubrics (Evidence Insight / Protocol Design / Data Analysis / Academic Writing / Other), a Research Veto that applies to all four research categories, human eval viewer generation, actionable P0/P1/P2 optimization recommendations, and automatic skill improvement that outputs a polished, production-ready SKILL.md. Also use whenever a user says "audit my skill", "evaluate my skill", "improve my skill", or wants a corrected version after evaluation.
two-sample-mr-research-planner
Generates complete two-sample Mendelian randomization (MR) research designs from a user-provided research direction. Use when users want to design, plan, or build a study using two-sample MR to test causal relationships. Triggers:"design a two-sample MR study", "build a publishable MR paper", "test whether this biomarker causally affects this disease", "generate Lite/Standard/Advanced MR plans", "screen multiple exposures with MR", "bidirectional MR design", "causal inference using GWAS summary statistics", or "I want to study X and Y using MR". Always outputs four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.
research-proposal-generator
Generates a comprehensive research proposal design based on input literature, including hypothesis, mechanism verification, and budget. Use when the user wants to design a research project from a paper.
research-grants
Write competitive research proposals for NSF, NIH, DOE, DARPA, and Taiwan's NSTC when you need agency-compliant narratives, budgets, and review-criteria alignment for a specific solicitation/FOA/BAA.
protocol-standardization
Standardize fragmented experimental steps into reproducible protocol documents when you need method organization, lab SOP drafting, or cross-operator reproducibility; missing parameters must be explicitly marked as "To be supplemented/Not provided".