bio-substructure-search

Searches molecular libraries for substructure matches using SMARTS patterns with RDKit. Filters compounds by pharmacophore features, functional groups, or scaffold matches with atom mapping. Use when finding compounds containing specific chemical moieties or filtering libraries by structural features.

1,802 stars

Best use case

bio-substructure-search is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Searches molecular libraries for substructure matches using SMARTS patterns with RDKit. Filters compounds by pharmacophore features, functional groups, or scaffold matches with atom mapping. Use when finding compounds containing specific chemical moieties or filtering libraries by structural features.

Teams using bio-substructure-search should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-substructure-search/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-substructure-search/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/bio-substructure-search/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How bio-substructure-search Compares

Feature / Agentbio-substructure-searchStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Searches molecular libraries for substructure matches using SMARTS patterns with RDKit. Filters compounds by pharmacophore features, functional groups, or scaffold matches with atom mapping. Use when finding compounds containing specific chemical moieties or filtering libraries by structural features.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: RDKit 2024.03+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Substructure Search

**"Filter my library for compounds containing a specific functional group"** → Search molecular collections for substructure matches using SMARTS patterns, identifying compounds that contain specified chemical moieties, scaffolds, or pharmacophore features.
- Python: `mol.HasSubstructMatch()`, `Chem.MolFromSmarts()` (RDKit)

Find molecules containing specific structural patterns using SMARTS.

## Basic Substructure Search

```python
from rdkit import Chem

mol = Chem.MolFromSmiles('c1ccc(O)cc1CCO')

# Check if pattern exists
pattern = Chem.MolFromSmarts('[OH]')  # Hydroxyl group
has_hydroxyl = mol.HasSubstructMatch(pattern)
print(f'Contains hydroxyl: {has_hydroxyl}')

# Get all matches (atom indices)
matches = mol.GetSubstructMatches(pattern)
print(f'Hydroxyl positions: {matches}')
```

## Common SMARTS Patterns

| Pattern | SMARTS | Description |
|---------|--------|-------------|
| Hydroxyl | `[OH]` | Alcohol/phenol |
| Primary amine | `[NH2]` | Primary amine |
| Secondary amine | `[NH1]` | Secondary amine |
| Carboxylic acid | `[CX3](=O)[OX2H1]` | COOH |
| Amide | `[CX3](=O)[NX3]` | C(=O)N |
| Benzene | `c1ccccc1` | Phenyl ring |
| Any aromatic | `[a]` | Any aromatic atom |
| Halogen | `[F,Cl,Br,I]` | Any halogen |

## Library Filtering

**Goal:** Filter a molecular library to retain only compounds containing (or lacking) a specific structural pattern.

**Approach:** Parse a SMARTS pattern and test each molecule for a substructure match, returning those that pass the inclusion or exclusion criterion.

```python
from rdkit import Chem

def filter_by_substructure(molecules, smarts, exclude=False):
    '''
    Filter molecules by substructure presence/absence.

    Args:
        molecules: List of RDKit mol objects
        smarts: SMARTS pattern string
        exclude: If True, return molecules WITHOUT the pattern
    '''
    pattern = Chem.MolFromSmarts(smarts)
    if pattern is None:
        raise ValueError(f'Invalid SMARTS: {smarts}')

    filtered = []
    for mol in molecules:
        if mol is None:
            continue
        has_match = mol.HasSubstructMatch(pattern)
        if exclude:
            if not has_match:
                filtered.append(mol)
        else:
            if has_match:
                filtered.append(mol)

    return filtered

# Filter for amines
amines = filter_by_substructure(library, '[NX3;H2,H1,H0]')

# Exclude reactive groups
clean = filter_by_substructure(library, '[N+]([O-])=O', exclude=True)  # No nitro
```

## Multiple Pattern Filtering

**Goal:** Apply multiple inclusion and exclusion substructure filters to narrow a compound set.

**Approach:** Sequentially apply SMARTS-based inclusion filters (must match all) then exclusion filters (must match none) to progressively narrow the library.

```python
def filter_multiple_patterns(molecules, include_patterns=None, exclude_patterns=None):
    '''
    Filter by multiple inclusion and exclusion patterns.
    '''
    result = list(molecules)

    if include_patterns:
        for smarts in include_patterns:
            pattern = Chem.MolFromSmarts(smarts)
            result = [m for m in result if m and m.HasSubstructMatch(pattern)]

    if exclude_patterns:
        for smarts in exclude_patterns:
            pattern = Chem.MolFromSmarts(smarts)
            result = [m for m in result if m and not m.HasSubstructMatch(pattern)]

    return result

# Find compounds with both amine and carboxylic acid (amino acids)
amino_acids = filter_multiple_patterns(
    library,
    include_patterns=['[NX3;H2]', '[CX3](=O)[OX2H1]']
)
```

## Atom Mapping

```python
from rdkit import Chem

def get_substructure_atoms(mol, smarts):
    '''
    Get all atoms matching a pattern with their indices.
    '''
    pattern = Chem.MolFromSmarts(smarts)
    matches = mol.GetSubstructMatches(pattern)

    results = []
    for match in matches:
        atoms = [mol.GetAtomWithIdx(i) for i in match]
        results.append({
            'indices': match,
            'symbols': [a.GetSymbol() for a in atoms]
        })

    return results

# Find and characterize all aromatic rings
mol = Chem.MolFromSmiles('c1ccc2c(c1)cccc2')
rings = get_substructure_atoms(mol, 'c1ccccc1')
print(f'Found {len(rings)} aromatic 6-membered rings')
```

## Recursive SMARTS

```python
# Recursive SMARTS for complex patterns

# Phenyl attached to carbonyl
pattern = '[$(c1ccccc1C(=O))]'

# Ortho-substituted phenyl
ortho_pattern = '[$(c1ccc([*])cc1[*])]'

# Electron-withdrawing group on aromatic
ewg_aromatic = '[$(c[$(C(=O)),$(C#N),$(N(=O)=O)])]'

mol = Chem.MolFromSmiles('c1ccc(C(=O)O)cc1')
pattern = Chem.MolFromSmarts('[$(c1ccccc1C(=O))]')
print(mol.HasSubstructMatch(pattern))  # True
```

## Visualization with Highlighting

```python
from rdkit.Chem.Draw import rdMolDraw2D

def draw_with_highlights(mol, smarts, filename):
    '''Draw molecule with substructure highlighted.'''
    pattern = Chem.MolFromSmarts(smarts)
    match = mol.GetSubstructMatch(pattern)

    if not match:
        print('No match found')
        return

    drawer = rdMolDraw2D.MolDraw2DCairo(400, 300)
    drawer.DrawMolecule(mol, highlightAtoms=match)
    drawer.FinishDrawing()

    with open(filename, 'wb') as f:
        f.write(drawer.GetDrawingText())

# Highlight carboxylic acid
draw_with_highlights(mol, '[CX3](=O)[OX2H1]', 'highlighted.png')
```

## Related Skills

- molecular-io - Load molecules for searching
- similarity-searching - Fingerprint-based searching
- admet-prediction - Filter before ADMET analysis

Related Skills

wikipedia-search

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Search and fetch structured content from Wikipedia using the MediaWiki API for reliable, encyclopedic information

tooluniverse-target-research

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.

tooluniverse-literature-deep-research

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Conduct comprehensive literature research with target disambiguation, evidence grading, and structured theme extraction. Creates a detailed report with mandatory completeness checklist, biological model synthesis, and testable hypotheses. For biological targets, resolves official IDs (Ensembl/UniProt), synonyms, naming collisions, and gathers expression/pathway context before literature search. Default deliverable is a report file; for single factoid questions, uses a fast verification mode and may include an inline answer. Use when users need thorough literature reviews, target profiles, or to verify specific claims from the literature.

tooluniverse-drug-research

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Generates comprehensive drug research reports with compound disambiguation, evidence grading, and mandatory completeness sections. Covers identity, chemistry, pharmacology, targets, clinical trials, safety, pharmacogenomics, and ADMET properties. Use when users ask about drugs, medications, therapeutics, or need drug profiling, safety assessment, or clinical development research.

tooluniverse-disease-research

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Generate comprehensive disease research reports using 100+ ToolUniverse tools. Creates a detailed markdown report file and progressively updates it with findings from 10 research dimensions. All information includes source references. Use when users ask about diseases, syndromes, or need systematic disease analysis.

research-lookup

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Look up current research information using Perplexity's Sonar Pro Search or Sonar Reasoning Pro models through OpenRouter. Automatically selects the best model based on query complexity. Search academic papers, recent studies, technical documentation, and general research information with citations.

research-grants

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Write competitive research proposals for NSF, NIH, DOE, and DARPA. Agency-specific formatting, review criteria, budget preparation, broader impacts, significance statements, innovation narratives, and compliance with submission requirements.

pubmed-search

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Search PubMed for scientific literature. Use when the user asks to find papers, search literature, look up research, find publications, or asks about recent studies. Triggers on "pubmed", "papers", "literature", "publications", "research on", "studies about".

perplexity-search

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Perform AI-powered web searches with real-time information using Perplexity models via LiteLLM and OpenRouter. This skill should be used when conducting web searches for current information, finding recent scientific literature, getting grounded answers with source citations, or accessing information beyond the model's knowledge cutoff. Provides access to multiple Perplexity models including Sonar Pro, Sonar Pro Search (advanced agentic search), and Sonar Reasoning Pro through a single OpenRouter API key.

patents-search

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Search global patents with natural language queries. Prior art, patent landscapes, and innovation tracking via Valyu.

open-targets-search

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Search Open Targets drug-disease associations with natural language queries. Target validation powered by Valyu semantic search.

multi-search-engine

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Multi search engine integration with 17 engines (8 CN + 9 Global). Supports advanced search operators, time filters, site search, privacy engines, and WolframAlpha knowledge queries. No API keys required.