molecular-similarity-search

Search for similar molecules using Tanimoto similarity with Morgan fingerprints to identify structurally related compounds.

157 stars

byInternScience

View on GitHub Installation ↓

Best use case

molecular-similarity-search is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Search for similar molecules using Tanimoto similarity with Morgan fingerprints to identify structurally related compounds.

Teams using molecular-similarity-search should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/molecular-similarity-search/SKILL.md --create-dirs "https://raw.githubusercontent.com/InternScience/DrClaw/main/drclaw/agent_hub/templates/chemistry/skills/molecular-similarity-search/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/molecular-similarity-search/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How molecular-similarity-search Compares

Feature / Agent	molecular-similarity-search	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Search for similar molecules using Tanimoto similarity with Morgan fingerprints to identify structurally related compounds.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Molecular Similarity Search

## Usage

### 1. MCP Server Definition

```python
import asyncio
import json
from mcp.client.streamable_http import streamablehttp_client
from mcp import ClientSession

class DrugSDAClient:
    """DrugSDA-Tool MCP Client"""

    def __init__(self, server_url: str, api_key: str):
        self.server_url = server_url
        self.api_key = api_key
        self.session = None

    async def connect(self):
        """Establish connection and initialize session"""
        print(f"server url: {self.server_url}")
        try:
            self.transport = streamablehttp_client(
                url=self.server_url,
                headers={"SCP-HUB-API-KEY": self.api_key}
            )
            self.read, self.write, self.get_session_id = await self.transport.__aenter__()

            self.session_ctx = ClientSession(self.read, self.write)
            self.session = await self.session_ctx.__aenter__()

            await self.session.initialize()
            session_id = self.get_session_id()

            print(f"✓ connect success")
            return True

        except Exception as e:
            print(f"✗ connect failure: {e}")
            return False

    async def disconnect(self):
        """Disconnect from server"""
        try:
            if self.session:
                await self.session_ctx.__aexit__(None, None, None)
            if hasattr(self, 'transport'):
                await self.transport.__aexit__(None, None, None)
            print("✓ already disconnect")
        except Exception as e:
            print(f"✗ disconnect error: {e}")

    def parse_result(self, result):
        """Parse MCP tool call result"""
        try:
            if hasattr(result, 'content') and result.content:
                content = result.content[0]
                if hasattr(content, 'text'):
                    return json.loads(content.text)
            return str(result)
        except Exception as e:
            return {"error": f"parse error: {e}", "raw": str(result)}
```

### 2. Molecular Similarity Search Workflow

This workflow searches for similar molecules using Tanimoto similarity calculated from Morgan fingerprints.

**Workflow Steps:**

1. **Define Target Molecule** - Specify the query SMILES
2. **Define Candidate Molecules** - Provide list of candidate SMILES
3. **Calculate Similarity** - Compute Tanimoto scores for all candidates
4. **Rank Results** - Sort by similarity score to find most similar molecules

**Implementation:**

```python
## Initialize client
client = DrugSDAClient(
    "https://scp.intern-ai.org.cn/api/v1/mcp/2/DrugSDA-Tool",
    "<your-api-key>"
)

if not await client.connect():
    print("connection failed")
    exit()

## Input: Target molecule and candidate library
target = "CCO"  # Ethanol
candidates = [
    "CCCO",      # Propanol
    "CCCCO",     # Butanol
    "CC(C)O",    # Isopropanol
    "CCC(C)O",   # sec-Butanol
    "C1CC1",     # Cyclopropane
    "CC=O",      # Acetaldehyde
    "CCCOO"      # Propanoic acid
]

## Execute similarity calculation
result = await client.session.call_tool(
    "calculate_smiles_similarity",
    arguments={
        "target_smiles": target,
        "candidate_smiles_list": candidates
    }
)

result_data = client.parse_result(result)
similarities = result_data['similarities']

## Sort and display top 3 most similar molecules
top3_smiles = sorted(similarities, key=lambda x: x['score'], reverse=True)[:3]

print(f"Target molecule: {target}\n")
print("Top 3 most similar molecules:")
for i, item in enumerate(top3_smiles, 1):
    print(f"{i}. {item['smiles']} - Tanimoto score: {item['score']:.4f}")

await client.disconnect()
```

### Tool Descriptions

**DrugSDA-Tool Server:**
- `calculate_smiles_similarity`: Compute molecular similarity using Morgan fingerprints
  - Args:
    - `target_smiles` (str): Query molecule SMILES string
    - `candidate_smiles_list` (list): List of candidate molecule SMILES strings
  - Returns:
    - `similarities` (list): List of similarity scores
      - `smiles` (str): Candidate SMILES string
      - `score` (float): Tanimoto similarity (0-1)

### Input/Output

**Input:**
- `target_smiles`: SMILES string of the query molecule
- `candidate_smiles_list`: List of SMILES strings to compare against

**Output:**
- List of similarity results:
  - `smiles`: Candidate molecule SMILES
  - `score`: Tanimoto similarity coefficient (0-1)
    - 1.0 = identical molecules
    - >0.7 = highly similar
    - 0.4-0.7 = moderately similar
    - <0.4 = dissimilar

### Similarity Interpretation

- **Score > 0.85**: Very high similarity, likely same scaffold
- **Score 0.7-0.85**: High similarity, similar pharmacophore
- **Score 0.5-0.7**: Moderate similarity, related structures
- **Score < 0.5**: Low similarity, different chemical space

### Use Cases

- Virtual screening and library filtering
- Scaffold hopping in drug design
- Chemical space exploration
- Lead compound identification
- Analog searching in compound databases
- Structure-activity relationship studies

### Performance Notes

- **Execution time**: <1 second for up to 1000 candidates
- **Fingerprint**: Morgan fingerprint (radius 2, 2048 bits)
- **Algorithm**: Tanimoto coefficient for binary fingerprints
- **Scalability**: Efficient for large compound libraries

Related Skills

protein_similarity_search

157

from InternScience/DrClaw

Protein Similarity Search - Search for similar proteins: extract sequence from PDB, search structures with FoldSeek, find homologs with STRING, and check UniProt. Use this skill for bioinformatics tasks involving extract pdb sequence foldseek search get best similarity hits between species search uniprotkb entries. Combines 4 tools from 3 SCP server(s).

molecular_visualization_suite

157

from InternScience/DrClaw

Molecular Visualization Suite - Visualize molecules: convert SMILES to formats, visualize molecule, visualize protein, visualize complex. Use this skill for chemical visualization tasks involving convert smiles to format visualize molecule visualize protein visualize complex. Combines 4 tools from 1 SCP server(s).

molecular_docking_pipeline

157

from InternScience/DrClaw

Molecular Docking Pipeline - Complete docking workflow: retrieve protein structure, predict binding pockets, prepare receptor, and dock ligand. Use this skill for structural biology tasks involving retrieve protein data by pdbcode run fpocket convert pdb to pdbqt dock quick molecule docking. Combines 4 tools from 2 SCP server(s).

substructure_activity_search

157

from InternScience/DrClaw

Substructure-Activity Relationship - Analyze substructure-activity: ChEMBL substructure search, activity data, PubChem compounds, and similarity. Use this skill for medicinal chemistry tasks involving get substructure by smiles search activity search pubchem by smiles calculate smiles similarity. Combines 4 tools from 3 SCP server(s).

molecular_fingerprint_analysis

157

from InternScience/DrClaw

Molecular Fingerprint Analysis - Fingerprint analysis: topology descriptors, structure complexity, similarity calculation, and AromaticityAnalysis. Use this skill for cheminformatics tasks involving calculate mol topology calculate mol structure complexity calculate smiles similarity AromaticityAnalyzer. Combines 4 tools from 2 SCP server(s).

nsfc-research-foundation-writer

157

from InternScience/DrClaw

当用户明确要求"写/改研究基础""研究基础+工作条件+风险应对编排"时使用。为 NSFC 正文"（三）研究基础"写作/重构，并同步编排"工作条件"和"研究风险应对"，用证据链证明项目可行、资源条件对位研究内容、风险预案可执行。

nsfc-research-content-writer

157

from InternScience/DrClaw

当用户明确要求"写/改研究内容""研究内容+创新+年度计划编排"时使用。为 NSFC 正文"（二）研究内容"写作/重构，并同步编排"特色与创新"和"三年年度研究计划"，输出可直接落到 LaTeX 模板的三个 extraTex 文件。

drugsda-mol-similarity

157

from InternScience/DrClaw

Compute the Tanimoto similarities between a target molecule and a list of candidate molecules using Morgan fingerprints.

scientific-literature-search

157

from InternScience/DrClaw

Search scientific literature and research papers using FlowSearch to find relevant academic articles and publications.

pubmed-article-search

157

from InternScience/DrClaw

Search PubMed database for scientific articles and publications to retrieve biomedical literature.

biomedical-web-search

157

from InternScience/DrClaw

Search biomedical literature and web content using Tavily search engine for research and clinical information.

academic-deep-research

157

from InternScience/DrClaw

Transparent, rigorous research with full methodology — not a black-box API wrapper. Conducts exhaustive investigation through mandated 2-cycle research per theme, APA 7th citations, evidence hierarchy, and 3 user checkpoints. Self-contained using native OpenClaw tools (web_search, web_fetch, sessions_spawn). Use for literature reviews, competitive intelligence, or any research requiring academic rigor and reproducibility.