protein-blast-search

Search for similar protein sequences in UniProt Swiss-Prot database using BLAST to identify homologous proteins and functional relationships.

157 stars

Best use case

protein-blast-search is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Search for similar protein sequences in UniProt Swiss-Prot database using BLAST to identify homologous proteins and functional relationships.

Teams using protein-blast-search should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/protein-blast-search/SKILL.md --create-dirs "https://raw.githubusercontent.com/InternScience/DrClaw/main/drclaw/agent_hub/templates/biochemistry/skills/protein-blast-search/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/protein-blast-search/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How protein-blast-search Compares

Feature / Agentprotein-blast-searchStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Search for similar protein sequences in UniProt Swiss-Prot database using BLAST to identify homologous proteins and functional relationships.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Protein BLAST Sequence Similarity Search

## Usage

### 1. MCP Server Definition

```python
import asyncio
import json
from mcp.client.streamable_http import streamablehttp_client
from mcp import ClientSession

class BioInfoToolsClient:
    """BioInfo-Tools MCP Client"""

    def __init__(self, server_url: str, api_key: str):
        self.server_url = server_url
        self.api_key = api_key
        self.session = None

    async def connect(self):
        """Establish connection and initialize session"""
        print(f"server url: {self.server_url}")
        try:
            self.transport = streamablehttp_client(
                url=self.server_url,
                headers={"SCP-HUB-API-KEY": self.api_key}
            )
            self.read, self.write, self.get_session_id = await self.transport.__aenter__()

            self.session_ctx = ClientSession(self.read, self.write)
            self.session = await self.session_ctx.__aenter__()

            await self.session.initialize()
            session_id = self.get_session_id()

            print(f"✓ connect success")
            return True

        except Exception as e:
            print(f"✗ connect failure: {e}")
            import traceback
            traceback.print_exc()
            return False

    async def disconnect(self):
        """Disconnect from server"""
        try:
            if self.session:
                await self.session_ctx.__aexit__(None, None, None)
            if hasattr(self, 'transport'):
                await self.transport.__aexit__(None, None, None)
            print("✓ already disconnect")
        except Exception as e:
            print(f"✗ disconnect error: {e}")

    def parse_result(self, result):
        """Parse MCP tool call result"""
        try:
            if hasattr(result, 'content') and result.content:
                content = result.content[0]
                if hasattr(content, 'text'):
                    return json.loads(content.text)
            return str(result)
        except Exception as e:
            return {"error": f"parse error: {e}", "raw": str(result)}
```

### 2. Protein BLAST Search Workflow

This workflow searches for similar protein sequences in the UniProt Swiss-Prot database using BLAST, identifying homologous proteins and their functional relationships.

**Workflow Steps:**

1. **Validate Input** - Ensure protein sequence is in valid amino acid format
2. **Execute BLAST Search** - Query UniProt Swiss-Prot database for similar sequences
3. **Parse Results** - Extract matching proteins with identity, E-value, and organism information

**Implementation:**

```python
from datetime import timedelta

## Initialize client
client = BioInfoToolsClient(
    "https://scp.intern-ai.org.cn/api/v1/mcp/17/BioInfo-Tools",
    "<your-api-key>"
)

if not await client.connect():
    print("connection failed")
    exit()

## Input: Protein sequence to search
protein_sequence = """
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
"""

## Step 1 & 2: Execute BLAST search against UniProt Swiss-Prot
result = await client.session.call_tool(
    "blast_search",
    arguments={
        "sequence": protein_sequence.strip(),
        "sequence_id": "HBB_HUMAN",  # Optional identifier
        "evalue": 0.01,              # E-value threshold (default: 0.01)
        "max_hits": 50               # Maximum number of hits to return
    },
    read_timeout_seconds=timedelta(seconds=300)  # Allow up to 5 minutes
)

## Step 3: Parse and display results
result_data = client.parse_result(result)

if result_data.get("success"):
    print(f"✅ BLAST search completed successfully")
    print(f"Execution time: {result_data.get('time_seconds', '?')} seconds")
    print(f"Total hits found: {result_data.get('total_hits', 0)}\n")

    hits = result_data.get("hits", [])

    # Display top matches
    for i, hit in enumerate(hits[:10], 1):
        print(f"{i}. {hit['uniprot_id']} - {hit.get('organism', 'N/A')}")
        print(f"   Description: {hit['description']}")
        print(f"   Identity: {hit['identity_percent']:.1f}%")
        print(f"   E-value: {hit['evalue']:.2e}")
        print(f"   Alignment length: {hit['alignment_length']} aa\n")
else:
    print(f"❌ BLAST search failed: {result_data.get('error', 'Unknown error')}")

await client.disconnect()
```

### Tool Descriptions

**BioInfo-Tools Server:**
- `blast_search`: Search for similar protein sequences in UniProt Swiss-Prot database
  - Args:
    - `sequence` (str): Protein sequence in amino acid single-letter code
    - `sequence_id` (str, optional): Identifier for the query sequence
    - `evalue` (float, optional): E-value threshold (default: 0.01)
    - `max_hits` (int, optional): Maximum number of hits to return (default: 50)
  - Returns:
    - `success` (bool): Whether search completed successfully
    - `total_hits` (int): Number of matching sequences found
    - `hits` (list): List of matching proteins with details
    - `time_seconds` (float): Execution time

### Input/Output

**Input:**
- `sequence`: Protein sequence (amino acid single-letter code)
- `sequence_id`: Optional identifier for the query
- `evalue`: E-value threshold (lower = more stringent, default: 0.01)
- `max_hits`: Maximum number of results to return (default: 50)

**Output:**
- List of similar proteins, each containing:
  - `uniprot_id`: UniProt accession number
  - `description`: Protein description and name
  - `organism`: Species/organism name
  - `identity_percent`: Sequence identity percentage (0-100)
  - `evalue`: E-value (statistical significance, lower is better)
  - `alignment_length`: Length of sequence alignment
  - `query_coverage`: Percentage of query sequence covered

### E-value Interpretation

- **E-value < 1e-10**: Highly significant match, very likely homologous
- **E-value < 1e-5**: Significant match, likely homologous
- **E-value < 0.01**: Potentially homologous (default threshold)
- **E-value > 0.01**: May be spurious matches

### Use Cases

- Identify protein function by homology
- Find evolutionarily related proteins
- Discover orthologs and paralogs across species
- Annotate unknown protein sequences
- Study protein evolution and phylogeny

### Performance Notes

- **Typical execution time**: 10-90 seconds depending on sequence length and max_hits
- **Shorter sequences** (<50 aa): May return more non-specific matches
- **Longer sequences** (>500 aa): May take longer but provide more specific matches
- **Timeout recommendation**: Set to at least 300 seconds (5 minutes) for reliability

Related Skills

protein_structure_analysis

157
from InternScience/DrClaw

Protein Structure Comprehensive Analysis - Comprehensive structure analysis: download PDB, extract chains, calculate geometry, quality metrics, and composition. Use this skill for structural biology tasks involving retrieve protein data by pdbcode extract pdb chains calculate pdb structural geometry calculate pdb quality metrics calculate pdb composition info. Combines 5 tools from 1 SCP server(s).

protein_solubility_optimization

157
from InternScience/DrClaw

Protein Solubility Optimization - Optimize protein solubility: calculate properties, predict solubility, predict hydrophilicity, and suggest mutations. Use this skill for protein engineering tasks involving calculate protein sequence properties predict protein function ComputeHydrophilicity zero shot sequence prediction. Combines 4 tools from 3 SCP server(s).

protein_similarity_search

157
from InternScience/DrClaw

Protein Similarity Search - Search for similar proteins: extract sequence from PDB, search structures with FoldSeek, find homologs with STRING, and check UniProt. Use this skill for bioinformatics tasks involving extract pdb sequence foldseek search get best similarity hits between species search uniprotkb entries. Combines 4 tools from 3 SCP server(s).

protein_quality_assessment

157
from InternScience/DrClaw

Protein Structure Quality Assessment - Assess structure quality: basic info, geometry analysis, quality metrics, composition, and visualization. Use this skill for structural biology tasks involving calculate pdb basic info calculate pdb structural geometry calculate pdb quality metrics calculate pdb composition info visualize protein. Combines 5 tools from 1 SCP server(s).

protein_property_comparison

157
from InternScience/DrClaw

Cross-Species Protein Comparison - Compare proteins across species: get orthologs from NCBI, compute properties for each, and compare similarity. Use this skill for comparative biology tasks involving get gene orthologs calculate protein sequence properties calculate smiles similarity get homology id. Combines 4 tools from 3 SCP server(s).

protein_interaction_network

157
from InternScience/DrClaw

Protein Interaction Network Analysis - Build protein interaction network: map identifiers with STRING, get PPI network, compute enrichment, and link to KEGG pathways. Use this skill for systems biology tasks involving mapping identifiers get string network interaction get ppi enrichment kegg link. Combines 4 tools from 2 SCP server(s).

protein_function_annotation

157
from InternScience/DrClaw

Protein Function Annotation Pipeline - Annotate protein function: UniProt metadata, InterPro domains, functional prediction, and GO enrichment. Use this skill for proteomics tasks involving query uniprot query interpro predict protein function get functional enrichment. Combines 4 tools from 2 SCP server(s).

protein_engineering

157
from InternScience/DrClaw

Protein Engineering Workflow - Engineer a protein: predict structure, identify functional residues, predict beneficial mutations, and calculate properties. Use this skill for protein engineering tasks involving Protein structure prediction ESMFold predict functional residue zero shot sequence prediction calculate protein sequence properties. Combines 4 tools from 2 SCP server(s).

protein_database_crossref

157
from InternScience/DrClaw

Protein Cross-Database Reference - Cross-reference protein: UniProt entry, NCBI gene, Ensembl xrefs, and PDB structure search. Use this skill for proteomics tasks involving get uniprotkb entry by accession get gene metadata by gene name get xrefs symbol retrieve protein data by pdbcode. Combines 4 tools from 4 SCP server(s).

protein_complex_analysis

157
from InternScience/DrClaw

Protein Complex Visualization & Analysis - Analyze protein complex: download structure, visualize complex, extract chains, and calculate quality metrics. Use this skill for structural biology tasks involving retrieve protein data by pdbcode visualize complex extract pdb chains calculate pdb basic info. Combines 4 tools from 1 SCP server(s).

protein_classification_analysis

157
from InternScience/DrClaw

Protein Classification Analysis - Classify protein: ChEMBL protein classification, UniProt entry, InterPro domains, and Ensembl biotypes. Use this skill for protein science tasks involving search protein classification get uniprotkb entry by accession query interpro get info biotypes. Combines 4 tools from 4 SCP server(s).

full_protein_analysis

157
from InternScience/DrClaw

Full Protein Characterization - Complete protein characterization: validate sequence, compute all properties, predict structure, and analyze pockets. Use this skill for protein biochemistry tasks involving is valid protein sequence analyze protein ComputeProtPara pred protein structure esmfold run fpocket. Combines 5 tools from 4 SCP server(s).