bio-pdb-structure-navigation
Navigate protein structure hierarchy using Biopython Bio.PDB SMCRA model. Use when accessing models, chains, residues, and atoms, iterating over structure levels, or extracting sequences from PDB files.
Best use case
bio-pdb-structure-navigation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Navigate protein structure hierarchy using Biopython Bio.PDB SMCRA model. Use when accessing models, chains, residues, and atoms, iterating over structure levels, or extracting sequences from PDB files.
Teams using bio-pdb-structure-navigation should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/bio-pdb-structure-navigation/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How bio-pdb-structure-navigation Compares
| Feature / Agent | bio-pdb-structure-navigation | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Navigate protein structure hierarchy using Biopython Bio.PDB SMCRA model. Use when accessing models, chains, residues, and atoms, iterating over structure levels, or extracting sequences from PDB files.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
## Version Compatibility
Reference examples tested with: BioPython 1.83+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Structure Navigation
**"Access residues and atoms in a PDB structure"** → Navigate the Structure-Model-Chain-Residue-Atom hierarchy to iterate over components, extract sequences, and access atomic coordinates.
- Python: `structure[0]['A'][100]['CA'].get_vector()` for direct access
Navigate the Structure-Model-Chain-Residue-Atom (SMCRA) hierarchy to access and iterate over structure components.
## Required Imports
```python
from Bio.PDB import PDBParser, PPBuilder, Selection
from Bio.Data.PDBData import protein_letters_3to1
```
## SMCRA Hierarchy
```
Structure
|
+-- Model (0, 1, ...) # NMR ensembles, crystal asymmetric unit
|
+-- Chain (A, B, ...) # Polypeptide chains, ligands
|
+-- Residue # Amino acids, nucleotides, hetero groups
|
+-- Atom # Individual atoms
```
## Accessing Hierarchy Levels
```python
from Bio.PDB import PDBParser
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
# Access by index/ID
model = structure[0] # First model
chain = model['A'] # Chain A
residue = chain[100] # Residue 100 (simple numbering)
residue = chain[(' ', 100, ' ')] # Full residue ID (hetfield, resseq, icode)
atom = residue['CA'] # C-alpha atom
```
## Iterating Over Structure
```python
# Iterate all levels
for model in structure:
for chain in model:
for residue in chain:
for atom in residue:
print(f'{chain.id}:{residue.id[1]}:{atom.name}')
# Shortcut iterators (all levels below current)
for chain in structure.get_chains():
print(f'Chain: {chain.id}')
for residue in structure.get_residues():
print(f'Residue: {residue.resname}')
for atom in structure.get_atoms():
print(f'Atom: {atom.name} at {atom.coord}')
```
## Residue Identification
```python
# Residue ID is a tuple: (hetfield, resseq, icode)
for residue in chain:
hetfield, resseq, icode = residue.id
print(f'Residue {resseq}{icode}: {residue.resname}')
# hetfield values:
# ' ' - standard amino acid
# 'W' - water
# 'H_xxx' - hetero residue (ligand, modified residue)
# Filter standard residues only
standard_residues = [r for r in chain if r.id[0] == ' ']
# Filter water
waters = [r for r in chain if r.id[0] == 'W']
# Filter hetero atoms (ligands)
hetero = [r for r in chain if r.id[0].startswith('H_')]
```
## Atom Properties
```python
for atom in residue:
print(f'Name: {atom.name}')
print(f'Element: {atom.element}')
print(f'Coordinates: {atom.coord}')
print(f'B-factor: {atom.bfactor}')
print(f'Occupancy: {atom.occupancy}')
print(f'Full ID: {atom.full_id}')
print(f'Serial number: {atom.serial_number}')
```
## Getting Full Identifiers
```python
# Full hierarchical ID from any entity
atom = structure[0]['A'][100]['CA']
print(atom.get_full_id())
# ('protein', 0, 'A', (' ', 100, ' '), ('CA', ' '))
# Components: (structure_id, model_id, chain_id, residue_id, atom_id)
```
## Checking for Children
```python
# Check if entity has child
if chain.has_id(100):
residue = chain[100]
# Check if residue has atom
if residue.has_id('CA'):
ca = residue['CA']
# Get list of all children
chains = structure[0].get_list()
residues = chain.get_list()
atoms = residue.get_list()
```
## Getting Parent Entity
```python
# Navigate up hierarchy
atom = structure[0]['A'][100]['CA']
residue = atom.get_parent()
chain = residue.get_parent()
model = chain.get_parent()
structure = model.get_parent()
```
## Extracting Polypeptide Sequences
```python
from Bio.PDB import PDBParser, PPBuilder
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
ppb = PPBuilder()
for pp in ppb.build_peptides(structure):
seq = pp.get_sequence()
print(f'Polypeptide: {seq}')
print(f'Length: {len(seq)}')
# Get all sequences as list
sequences = [pp.get_sequence() for pp in ppb.build_peptides(structure)]
```
## Using CaPPBuilder for Broken Chains
```python
from Bio.PDB import CaPPBuilder
# Use when backbone is incomplete
# Connects residues if CA atoms are within 4.3 Angstroms
ppb = CaPPBuilder()
for pp in ppb.build_peptides(structure):
print(f'Fragment: {pp.get_sequence()}')
```
## Converting Residue Names
```python
from Bio.Data.PDBData import protein_letters_3to1
# Three-letter to one-letter conversion
three_letter = 'ALA'
one_letter = protein_letters_3to1.get(three_letter, 'X')
print(f'{three_letter} -> {one_letter}') # ALA -> A
# Build sequence manually
sequence = ''
for residue in chain:
if residue.id[0] == ' ': # Standard residue
code = protein_letters_3to1.get(residue.resname, 'X')
sequence += code
print(f'Sequence: {sequence}')
```
## Using Selection.unfold_entities
```python
from Bio.PDB import Selection
# Extract entities at specific level
# Codes: S=structure, M=model, C=chain, R=residue, A=atom
# Get all residues from structure
residues = Selection.unfold_entities(structure, 'R')
print(f'Total residues: {len(residues)}')
# Get all atoms from a chain
atoms = Selection.unfold_entities(chain, 'A')
print(f'Atoms in chain: {len(atoms)}')
# Get all chains from model
chains = Selection.unfold_entities(model, 'C')
```
## Handling Disordered Atoms
```python
# Check for disorder
if atom.is_disordered():
print(f'Atom {atom.name} has multiple conformations')
print(f'Alt locations: {atom.disordered_get_id_list()}')
# Select specific conformation
atom.disordered_select('A')
print(f'Coord for alt A: {atom.coord}')
# Get all conformations
for altloc in atom.disordered_get_id_list():
atom.disordered_select(altloc)
print(f' {altloc}: {atom.coord}')
# Get unpacked list (all conformations)
all_atoms = atom.disordered_get_list()
```
## Handling Disordered Residues
```python
# Point mutations at same position
if residue.is_disordered():
print(f'Disordered residue at {residue.id}')
names = residue.disordered_get_id_list()
print(f'Alternative residues: {names}')
# Select specific residue type
residue.disordered_select('ALA')
```
## Finding Specific Atoms
```python
# Get backbone atoms
backbone_names = ['N', 'CA', 'C', 'O']
for residue in chain:
backbone = [residue[name] for name in backbone_names if residue.has_id(name)]
# Get all C-alpha atoms
ca_atoms = [r['CA'] for r in structure.get_residues() if r.has_id('CA')]
print(f'Found {len(ca_atoms)} CA atoms')
# Get sidechain atoms
for residue in chain:
sidechain = [a for a in residue if a.name not in ['N', 'CA', 'C', 'O']]
```
## Filtering by Residue Type
```python
# Get only amino acids
amino_acids = [r for r in chain if r.id[0] == ' ']
# Get specific amino acid types
arginines = [r for r in chain if r.resname == 'ARG']
charged = [r for r in chain if r.resname in ['ARG', 'LYS', 'ASP', 'GLU']]
# Get hetero atoms
ligands = [r for r in chain if r.id[0].startswith('H_')]
for lig in ligands:
print(f'Ligand: {lig.resname} at position {lig.id[1]}')
```
## Counting Entities
```python
# Count at each level
n_models = len(list(structure.get_models()))
n_chains = len(list(structure.get_chains()))
n_residues = len(list(structure.get_residues()))
n_atoms = len(list(structure.get_atoms()))
print(f'Models: {n_models}, Chains: {n_chains}')
print(f'Residues: {n_residues}, Atoms: {n_atoms}')
# Count per chain
for chain in structure.get_chains():
n_res = len([r for r in chain if r.id[0] == ' '])
print(f'Chain {chain.id}: {n_res} amino acids')
```
## Working with NMR Ensembles
```python
# NMR structures have multiple models
parser = PDBParser(QUIET=True)
structure = parser.get_structure('nmr', 'nmr_structure.pdb')
n_models = len(list(structure.get_models()))
print(f'NMR ensemble with {n_models} conformers')
# Iterate over models
for model in structure:
# Each model is a separate conformation
ca_coords = [r['CA'].coord for r in model.get_residues() if r.has_id('CA')]
print(f'Model {model.id}: {len(ca_coords)} CA atoms')
```
## Related Skills
- structure-io - Parse and write structure files
- geometric-analysis - Measure distances, angles, RMSD
- structure-modification - Modify coordinates and properties
- sequence-manipulation/seq-objects - Work with extracted sequencesRelated Skills
tooluniverse-protein-structure-retrieval
Retrieves protein structure data from RCSB PDB, PDBe, and AlphaFold with protein disambiguation, quality assessment, and comprehensive structural profiles. Creates detailed structure reports with experimental metadata, ligand information, and download links. Use when users need protein structures, 3D models, crystallography data, or mention PDB IDs (4-character codes like 1ABC) or UniProt accessions.
bio-substructure-search
Searches molecular libraries for substructure matches using SMARTS patterns with RDKit. Filters compounds by pharmacophore features, functional groups, or scaffold matches with atom mapping. Use when finding compounds containing specific chemical moieties or filtering libraries by structural features.
bio-structural-biology-modern-structure-prediction
Predict protein structures using modern ML models including AlphaFold3, ESMFold, Chai-1, and Boltz-1. Use when predicting structures for novel proteins, protein complexes, or when comparing predictions across multiple methods.
bio-pdb-structure-modification
Modify protein structures using Biopython Bio.PDB. Use when transforming coordinates, removing atoms or residues, adding new entities, modifying B-factors and occupancies, or building structures programmatically.
bio-pdb-structure-io
Parse and write protein structure files using Biopython Bio.PDB. Use when reading PDB, mmCIF, and MMTF files, downloading structures from RCSB PDB, or writing structures to various formats.
zinc-database
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
zarr-python
Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.
xlsx
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
writing-skills
Use when creating new skills, editing existing skills, or verifying skills work before deployment
writing-plans
Use when you have a spec or requirements for a multi-step task, before touching code
wikipedia-search
Search and fetch structured content from Wikipedia using the MediaWiki API for reliable, encyclopedic information
wellally-tech
Integrate digital health data sources (Apple Health, Fitbit, Oura Ring) and connect to WellAlly.tech knowledge base. Import external health device data, standardize to local format, and recommend relevant WellAlly.tech knowledge base articles based on health data. Support generic CSV/JSON import, provide intelligent article recommendations, and help users better manage personal health data.