protein-blast-search

Search for similar protein sequences in UniProt Swiss-Prot database using BLAST to identify homologous proteins and functional relationships.

370 stars

bySpectrAI-Initiative

View on GitHub Installation ↓

Best use case

protein-blast-search is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Search for similar protein sequences in UniProt Swiss-Prot database using BLAST to identify homologous proteins and functional relationships.

Teams using protein-blast-search should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/protein-blast-search/SKILL.md --create-dirs "https://raw.githubusercontent.com/SpectrAI-Initiative/InnoClaw/main/.claude/skills/protein-blast-search/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/protein-blast-search/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How protein-blast-search Compares

Feature / Agent	protein-blast-search	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Search for similar protein sequences in UniProt Swiss-Prot database using BLAST to identify homologous proteins and functional relationships.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Protein BLAST Sequence Similarity Search

## Usage

### 1. MCP Server Definition

```python
import asyncio
import json
from contextlib import AsyncExitStack
from mcp.client.streamable_http import streamablehttp_client
from mcp import ClientSession

class BioInfoToolsClient:
    """BioInfo-Tools MCP Client"""

    def __init__(self, server_url: str, api_key: str):
        self.server_url = server_url
        self.api_key = api_key
        self.session = None

    async def connect(self):
        """Establish connection and initialize session"""
        print(f"server url: {self.server_url}")
        try:
            self.transport = streamablehttp_client(
                url=self.server_url,
                headers={"SCP-HUB-API-KEY": self.api_key}
            )
            self._stack = AsyncExitStack()
            await self._stack.__aenter__()
            self.read, self.write, self.get_session_id = await self._stack.enter_async_context(self.transport)

            self.session_ctx = ClientSession(self.read, self.write)
            self.session = await self._stack.enter_async_context(self.session_ctx)

            await self.session.initialize()
            session_id = self.get_session_id()

            print(f"✓ connect success")
            return True

        except Exception as e:
            print(f"✗ connect failure: {e}")
            import traceback
            traceback.print_exc()
            return False

    async def disconnect(self):
        """Disconnect from server"""
        try:
            if hasattr(self, '_stack'):
                await self._stack.aclose()
            print("✓ already disconnect")
        except Exception as e:
            print(f"✗ disconnect error: {e}")
    def parse_result(self, result):
        """Parse MCP tool call result"""
        try:
            if hasattr(result, 'content') and result.content:
                content = result.content[0]
                if hasattr(content, 'text'):
                    return json.loads(content.text)
            return str(result)
        except Exception as e:
            return {"error": f"parse error: {e}", "raw": str(result)}
```

### 2. Protein BLAST Search Workflow

This workflow searches for similar protein sequences in the UniProt Swiss-Prot database using BLAST, identifying homologous proteins and their functional relationships.

**Workflow Steps:**

1. **Validate Input** - Ensure protein sequence is in valid amino acid format
2. **Execute BLAST Search** - Query UniProt Swiss-Prot database for similar sequences
3. **Parse Results** - Extract matching proteins with identity, E-value, and organism information

**Implementation:**

```python
from datetime import timedelta

## Initialize client
client = BioInfoToolsClient(
    "https://scp.intern-ai.org.cn/api/v1/mcp/17/BioInfo-Tools",
    "<your-api-key>"
)

if not await client.connect():
    print("connection failed")
    exit()

## Input: Protein sequence to search
protein_sequence = """
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
"""

## Step 1 & 2: Execute BLAST search against UniProt Swiss-Prot
result = await client.session.call_tool(
    "blast_search",
    arguments={
        "sequence": protein_sequence.strip(),
        "sequence_id": "HBB_HUMAN",  # Optional identifier
        "evalue": 0.01,              # E-value threshold (default: 0.01)
        "max_hits": 50               # Maximum number of hits to return
    },
    read_timeout_seconds=timedelta(seconds=300)  # Allow up to 5 minutes
)

## Step 3: Parse and display results
result_data = client.parse_result(result)

if result_data.get("success"):
    print(f"✅ BLAST search completed successfully")
    print(f"Execution time: {result_data.get('time_seconds', '?')} seconds")
    print(f"Total hits found: {result_data.get('total_hits', 0)}\n")

    hits = result_data.get("hits", [])

    # Display top matches
    for i, hit in enumerate(hits[:10], 1):
        print(f"{i}. {hit['uniprot_id']} - {hit.get('organism', 'N/A')}")
        print(f"   Description: {hit['description']}")
        print(f"   Identity: {hit['identity_percent']:.1f}%")
        print(f"   E-value: {hit['evalue']:.2e}")
        print(f"   Alignment length: {hit['alignment_length']} aa\n")
else:
    print(f"❌ BLAST search failed: {result_data.get('error', 'Unknown error')}")

await client.disconnect()
```

### Tool Descriptions

**BioInfo-Tools Server:**
- `blast_search`: Search for similar protein sequences in UniProt Swiss-Prot database
  - Args:
    - `sequence` (str): Protein sequence in amino acid single-letter code
    - `sequence_id` (str, optional): Identifier for the query sequence
    - `evalue` (float, optional): E-value threshold (default: 0.01)
    - `max_hits` (int, optional): Maximum number of hits to return (default: 50)
  - Returns:
    - `success` (bool): Whether search completed successfully
    - `total_hits` (int): Number of matching sequences found
    - `hits` (list): List of matching proteins with details
    - `time_seconds` (float): Execution time

### Input/Output

**Input:**
- `sequence`: Protein sequence (amino acid single-letter code)
- `sequence_id`: Optional identifier for the query
- `evalue`: E-value threshold (lower = more stringent, default: 0.01)
- `max_hits`: Maximum number of results to return (default: 50)

**Output:**
- List of similar proteins, each containing:
  - `uniprot_id`: UniProt accession number
  - `description`: Protein description and name
  - `organism`: Species/organism name
  - `identity_percent`: Sequence identity percentage (0-100)
  - `evalue`: E-value (statistical significance, lower is better)
  - `alignment_length`: Length of sequence alignment
  - `query_coverage`: Percentage of query sequence covered

### E-value Interpretation

- **E-value < 1e-10**: Highly significant match, very likely homologous
- **E-value < 1e-5**: Significant match, likely homologous
- **E-value < 0.01**: Potentially homologous (default threshold)
- **E-value > 0.01**: May be spurious matches

### Use Cases

- Identify protein function by homology
- Find evolutionarily related proteins
- Discover orthologs and paralogs across species
- Annotate unknown protein sequences
- Study protein evolution and phylogeny

### Performance Notes

- **Typical execution time**: 10-90 seconds depending on sequence length and max_hits
- **Shorter sequences** (<50 aa): May return more non-specific matches
- **Longer sequences** (>500 aa): May take longer but provide more specific matches
- **Timeout recommendation**: Set to at least 300 seconds (5 minutes) for reliability

Related Skills

uniprot-protein-retrieval

370

from SpectrAI-Initiative/InnoClaw

Retrieve protein sequences and functional information from UniProt database by protein name, enabling protein analysis and bioinformatics workflows.

substructure_activity_search

370

from SpectrAI-Initiative/InnoClaw

Substructure-Activity Relationship - Analyze substructure-activity: ChEMBL substructure search, activity data, PubChem compounds, and similarity. Use this skill for medicinal chemistry tasks involving get substructure by smiles search activity search pubchem by smiles calculate smiles similarity. Combines 4 tools from 3 SCP server(s).

scientific-literature-search

370

from SpectrAI-Initiative/InnoClaw

Search scientific literature and research papers using FlowSearch to find relevant academic articles and publications.

Researcher Rigor Gate

370

from SpectrAI-Initiative/InnoClaw

Use before plan submission, major plan revision, and major stage transitions. Verify alignment, feasibility, rigor, completeness, and prevent unjustified regressions to earlier workflow phases.

Researcher Replan And Recovery

370

from SpectrAI-Initiative/InnoClaw

Use when the workflow hits contradictions, missing evidence, failed runs, design flaws, or resource shifts. Diagnose the failure class, choose the narrowest safe correction, and escalate to the user when the core plan changes.

Researcher Plan Architect

370

from SpectrAI-Initiative/InnoClaw

Use when the Researcher must convert a confirmed scientific goal into a staged, executable research plan with role assignments, milestones, resources, checkpoints, and risk controls.

Researcher Dispatch Supervisor

370

from SpectrAI-Initiative/InnoClaw

Use after the user confirms the plan. Dispatch the next justified worker task, supervise progress, enforce artifact-backed completion, and keep the workflow aligned with the approved plan.

Researcher Context Audit

370

from SpectrAI-Initiative/InnoClaw

Use when the Researcher starts, resumes, or reaches a major decision point. Build a context inventory from workstation materials, prior messages, existing artifacts, requirements, and unfinished work.

Researcher Ambiguity Gate

370

from SpectrAI-Initiative/InnoClaw

Use when the research goal, evaluation target, scope, resources, timeline, or decision criteria are ambiguous, conflicting, or not operationally testable.

Research Ideation Full

370

from SpectrAI-Initiative/InnoClaw

Use when the user wants the full research ideation workflow grounded in one seed paper, including complete ideation, feasibility review, experiment planning, and final synthesis, or makes an equivalent ideation request in another language.

pubmed-article-search

370

from SpectrAI-Initiative/InnoClaw

Search PubMed database for scientific articles and publications to retrieve biomedical literature.

pubchem-smiles-search

370

from SpectrAI-Initiative/InnoClaw

Search PubChem database using SMILES strings to retrieve compound information and chemical properties.