bio-molecular-descriptors
Calculates molecular descriptors and fingerprints using RDKit. Computes Morgan fingerprints (ECFP), MACCS keys, Lipinski properties, QED drug-likeness, TPSA, and 3D conformer descriptors. Use when featurizing molecules for machine learning or filtering by drug-likeness criteria.
Best use case
bio-molecular-descriptors is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Calculates molecular descriptors and fingerprints using RDKit. Computes Morgan fingerprints (ECFP), MACCS keys, Lipinski properties, QED drug-likeness, TPSA, and 3D conformer descriptors. Use when featurizing molecules for machine learning or filtering by drug-likeness criteria.
Teams using bio-molecular-descriptors should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/bio-molecular-descriptors/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How bio-molecular-descriptors Compares
| Feature / Agent | bio-molecular-descriptors | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Calculates molecular descriptors and fingerprints using RDKit. Computes Morgan fingerprints (ECFP), MACCS keys, Lipinski properties, QED drug-likeness, TPSA, and 3D conformer descriptors. Use when featurizing molecules for machine learning or filtering by drug-likeness criteria.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
## Version Compatibility
Reference examples tested with: RDKit 2024.03+, numpy 1.26+, pandas 2.2+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Molecular Descriptors
**"Calculate molecular fingerprints for my compound library"** → Compute structural fingerprints (Morgan/ECFP, MACCS keys) and physicochemical descriptors (Lipinski, QED, TPSA) for molecules, producing feature vectors for similarity analysis or ML models.
- Python: `AllChem.GetMorganFingerprintAsBitVect()`, `Descriptors.MolWt()`, `QED.qed()` (RDKit)
Calculate fingerprints and physicochemical properties for molecules.
## Morgan Fingerprints (ECFP)
**Goal:** Generate circular fingerprints that encode local chemical environments for similarity searching and ML models.
**Approach:** Use GetMorganFingerprintAsBitVect with a chosen radius (2 for ECFP4, 3 for ECFP6) and bit length, optionally including chirality information.
```python
from rdkit import Chem
from rdkit.Chem import AllChem
mol = Chem.MolFromSmiles('CCO')
# ECFP4 = radius 2 (diameter = 2 * radius + 2 = 6)
# ECFP6 = radius 3 (diameter = 8)
ecfp4 = AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, nBits=2048)
ecfp6 = AllChem.GetMorganFingerprintAsBitVect(mol, radius=3, nBits=2048)
# With stereochemistry information
ecfp4_chiral = AllChem.GetMorganFingerprintAsBitVect(
mol, radius=2, nBits=2048, useChirality=True
)
# As count vector (for some ML methods)
ecfp4_counts = AllChem.GetMorganFingerprint(mol, radius=2)
# Convert to numpy array
import numpy as np
fp_array = np.array(ecfp4)
```
## MACCS Keys
```python
from rdkit.Chem import MACCSkeys
maccs = MACCSkeys.GenMACCSKeys(mol) # 167 bits
# As numpy array
maccs_array = np.array(maccs)
```
## Lipinski Properties
```python
from rdkit import Chem
from rdkit.Chem import Descriptors, Lipinski
mol = Chem.MolFromSmiles('CCO')
# Lipinski Rule of 5 properties
mw = Descriptors.MolWt(mol) # Molecular weight (<=500)
logp = Descriptors.MolLogP(mol) # LogP (<=5)
hbd = Lipinski.NumHDonors(mol) # H-bond donors (<=5)
hba = Lipinski.NumHAcceptors(mol) # H-bond acceptors (<=10)
# Check Lipinski compliance
def passes_lipinski(mol):
'''Check Lipinski Rule of 5 compliance.'''
return (
Descriptors.MolWt(mol) <= 500 and
Descriptors.MolLogP(mol) <= 5 and
Lipinski.NumHDonors(mol) <= 5 and
Lipinski.NumHAcceptors(mol) <= 10
)
# Additional properties
tpsa = Descriptors.TPSA(mol) # Topological polar surface area
rotatable = Lipinski.NumRotatableBonds(mol)
```
## QED Drug-Likeness
```python
from rdkit.Chem.QED import qed
# QED score (0-1 scale, >0.5 generally drug-like)
qed_score = qed(mol)
# Weighted QED (default)
# Considers MW, LogP, TPSA, HBD, HBA, PSA, RotBonds, Aromatic rings
```
## Complete Descriptor Set
**Goal:** Calculate all available RDKit molecular descriptors for feature-rich ML input.
**Approach:** Build a MolecularDescriptorCalculator from the full descriptor list and apply it to each molecule, producing a descriptor DataFrame.
```python
from rdkit.Chem import Descriptors
from rdkit.ML.Descriptors import MoleculeDescriptors
# Get all available descriptor names
descriptor_names = [d[0] for d in Descriptors.descList]
# Create descriptor calculator
calculator = MoleculeDescriptors.MolecularDescriptorCalculator(descriptor_names)
# Calculate for a molecule
descriptors = calculator.CalcDescriptors(mol)
# As DataFrame
import pandas as pd
desc_df = pd.DataFrame([descriptors], columns=descriptor_names)
```
## 3D Conformer Descriptors
**Goal:** Compute 3D shape descriptors (asphericity, eccentricity, radius of gyration) from molecular conformers.
**Approach:** Generate a 3D conformer with ETKDGv3, optimize geometry with MMFF, then calculate 3D descriptors from the conformer coordinates.
```python
from rdkit import Chem
from rdkit.Chem import AllChem, Descriptors3D
mol = Chem.MolFromSmiles('CCO')
mol = Chem.AddHs(mol)
# Generate 3D conformer (ETKDGv3 is now default)
AllChem.EmbedMolecule(mol, AllChem.ETKDGv3())
# Optimize geometry
AllChem.MMFFOptimizeMolecule(mol)
# 3D descriptors (require conformer)
# Asphericity: 0 = sphere, 1 = rod
asphericity = Descriptors3D.Asphericity(mol)
# Eccentricity
eccentricity = Descriptors3D.Eccentricity(mol)
# Inertial shape factor
isf = Descriptors3D.InertialShapeFactor(mol)
# Radius of gyration
rog = Descriptors3D.RadiusOfGyration(mol)
```
## Batch Descriptor Calculation
**Goal:** Calculate a standard set of descriptors across an entire compound library.
**Approach:** Iterate over molecules, compute selected descriptors for each, and collect results into a DataFrame.
```python
def calculate_descriptors_batch(molecules, descriptor_names=None):
'''Calculate descriptors for multiple molecules.'''
if descriptor_names is None:
descriptor_names = ['MolWt', 'MolLogP', 'TPSA', 'NumHDonors',
'NumHAcceptors', 'NumRotatableBonds', 'qed']
results = []
for mol in molecules:
if mol is None:
results.append({d: None for d in descriptor_names})
continue
row = {}
for name in descriptor_names:
if name == 'qed':
from rdkit.Chem.QED import qed
row[name] = qed(mol)
else:
row[name] = getattr(Descriptors, name)(mol)
results.append(row)
return pd.DataFrame(results)
```
## Related Skills
- molecular-io - Load molecules for descriptor calculation
- similarity-searching - Use fingerprints for similarity
- admet-prediction - Predict ADMET from descriptors
- machine-learning/biomarker-discovery - ML on molecular featuresRelated Skills
molecular-dynamics
Run and analyze molecular dynamics simulations with OpenMM and MDAnalysis. Set up protein/small molecule systems, define force fields, run energy minimization and production MD, analyze trajectories (RMSD, RMSF, contact maps, free energy surfaces). For structural biology, drug binding, and biophysics.
bio-molecular-io
Reads, writes, and converts molecular file formats (SMILES, SDF, MOL2, PDB) using RDKit and Open Babel. Handles structure parsing, canonicalization, and full standardization pipeline including sanitization, normalization, and tautomer canonicalization. Use when loading chemical libraries, converting formats, or preparing molecules for analysis.
zinc-database
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
zarr-python
Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.
xlsx
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
writing-skills
Use when creating new skills, editing existing skills, or verifying skills work before deployment
writing-plans
Use when you have a spec or requirements for a multi-step task, before touching code
wikipedia-search
Search and fetch structured content from Wikipedia using the MediaWiki API for reliable, encyclopedic information
wellally-tech
Integrate digital health data sources (Apple Health, Fitbit, Oura Ring) and connect to WellAlly.tech knowledge base. Import external health device data, standardize to local format, and recommend relevant WellAlly.tech knowledge base articles based on health data. Support generic CSV/JSON import, provide intelligent article recommendations, and help users better manage personal health data.
weightloss-analyzer
分析减肥数据、计算代谢率、追踪能量缺口、管理减肥阶段
<!--
# COPYRIGHT NOTICE
verification-before-completion
Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always