comprehensive-protein-analysis

Comprehensive protein analysis combining InterProScan domain identification with BLAST similarity search to provide complete functional and evolutionary annotation.

157 stars

Best use case

comprehensive-protein-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Comprehensive protein analysis combining InterProScan domain identification with BLAST similarity search to provide complete functional and evolutionary annotation.

Teams using comprehensive-protein-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/comprehensive-protein-analysis/SKILL.md --create-dirs "https://raw.githubusercontent.com/InternScience/DrClaw/main/drclaw/agent_hub/templates/biochemistry/skills/comprehensive-protein-analysis/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/comprehensive-protein-analysis/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How comprehensive-protein-analysis Compares

Feature / Agentcomprehensive-protein-analysisStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Comprehensive protein analysis combining InterProScan domain identification with BLAST similarity search to provide complete functional and evolutionary annotation.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Comprehensive Protein Analysis

## Usage

### 1. MCP Server Definition

Use the same `BioInfoToolsClient` class as defined in the protein-blast-search skill.

### 2. Comprehensive Protein Analysis Workflow

This workflow combines InterProScan domain analysis with BLAST similarity search to provide a complete functional and evolutionary annotation of a protein sequence.

**Workflow Steps:**

1. **Validate Input** - Check protein sequence format
2. **Run InterProScan** - Identify functional domains and GO terms
3. **Run BLAST Search** - Find similar sequences and homologs
4. **Integrate Results** - Combine domain and homology information for comprehensive annotation

**Implementation:**

```python
from datetime import timedelta

## Initialize client
client = BioInfoToolsClient(
    "https://scp.intern-ai.org.cn/api/v1/mcp/17/BioInfo-Tools",
    "<your-api-key>"
)

if not await client.connect():
    print("connection failed")
    exit()

## Input: Protein sequence to analyze
protein_sequence = """
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
"""

sequence_id = "INS_HUMAN"

## Step 1, 2 & 3: Run comprehensive analysis (InterProScan + BLAST)
result = await client.session.call_tool(
    "analyze_protein",
    arguments={
        "sequence": protein_sequence.strip(),
        "sequence_id": sequence_id,
        "databases": ["Pfam"],     # InterProScan databases
        "evalue": 1e-5,             # BLAST E-value threshold (more stringent)
        "max_hits": 10              # BLAST max hits
    },
    read_timeout_seconds=timedelta(seconds=1200)  # Allow up to 20 minutes
)

## Step 4: Parse and display comprehensive results
result_data = client.parse_result(result)

print(f"{'='*80}")
print(f"Comprehensive Protein Analysis: {sequence_id}")
print(f"{'='*80}\n")

# InterProScan Results
ips_result = result_data.get("interproscan", {})
if ips_result.get("success"):
    ips_data = ips_result.get("results", {})
    domains = ips_data.get('domains', [])
    go_terms = ips_data.get('go_terms', [])

    print("=== DOMAIN ANALYSIS (InterProScan) ===")
    print(f"Execution time: {ips_result.get('time_seconds', '?')} seconds")
    print(f"Domains found: {len(domains)}")
    print(f"GO annotations: {len(go_terms)}\n")

    if domains:
        print("Functional Domains:")
        for domain in domains:
            print(f"  • {domain.get('name', 'N/A')} ({domain.get('database', 'N/A')})")
            if domain.get('description'):
                print(f"    Description: {domain.get('description')}")
            locations = domain.get('locations', [])
            if locations:
                loc = locations[0]
                print(f"    Position: {loc.get('start')}-{loc.get('end')} aa")
        print()

    if go_terms:
        print("Gene Ontology Annotations:")
        for go in go_terms[:5]:  # Show top 5
            print(f"  • {go.get('id', 'N/A')}: {go.get('name', 'N/A')}")
            print(f"    Category: {go.get('category', 'N/A')}")
        if len(go_terms) > 5:
            print(f"  ... and {len(go_terms) - 5} more")
        print()
else:
    print(f"❌ InterProScan failed: {ips_result.get('error', 'Unknown')}\n")

# BLAST Results
blast_result = result_data.get("blast", {})
if blast_result.get("success"):
    hits = blast_result.get('hits', [])

    print("=== HOMOLOGY SEARCH (BLAST) ===")
    print(f"Execution time: {blast_result.get('time_seconds', '?')} seconds")
    print(f"Similar sequences found: {blast_result.get('total_hits', 0)}")
    print(f"E-value threshold: {1e-5}\n")

    if hits:
        print("Top Homologous Proteins:")
        for i, hit in enumerate(hits[:5], 1):
            print(f"  {i}. {hit['uniprot_id']} - {hit.get('organism', 'N/A')}")
            print(f"     Description: {hit['description']}")
            print(f"     Identity: {hit['identity_percent']:.1f}%, E-value: {hit['evalue']:.2e}")
        if len(hits) > 5:
            print(f"  ... and {len(hits) - 5} more matches")
        print()
    else:
        print("No significant homologs found (E-value threshold may be too stringent)\n")
else:
    print(f"❌ BLAST failed: {blast_result.get('error', 'Unknown')}\n")

# Summary
print("=== FUNCTIONAL SUMMARY ===")
if domains:
    print(f"Protein Family: {domains[0].get('name', 'Unknown')}")
if hits:
    most_similar = hits[0]
    print(f"Most Similar Protein: {most_similar['uniprot_id']} ({most_similar['identity_percent']:.1f}% identity)")
    print(f"Organism: {most_similar.get('organism', 'Unknown')}")
print(f"{'='*80}")

await client.disconnect()
```

### Tool Descriptions

**BioInfo-Tools Server:**
- `analyze_protein`: Comprehensive protein analysis combining InterProScan and BLAST
  - Args:
    - `sequence` (str): Protein sequence in amino acid single-letter code
    - `sequence_id` (str, optional): Identifier for the query sequence
    - `databases` (list, optional): InterProScan databases (default: ["Pfam"])
    - `evalue` (float, optional): BLAST E-value threshold (default: 0.01)
    - `max_hits` (int, optional): Maximum BLAST hits (default: 10)
  - Returns:
    - `interproscan` (dict): InterProScan analysis results
      - `success` (bool): Whether InterProScan completed
      - `results` (dict): Domains and GO terms
      - `time_seconds` (float): Execution time
    - `blast` (dict): BLAST search results
      - `success` (bool): Whether BLAST completed
      - `hits` (list): Similar proteins
      - `total_hits` (int): Number of matches
      - `time_seconds` (float): Execution time

### Input/Output

**Input:**
- `sequence`: Protein sequence (amino acid single-letter code)
- `sequence_id`: Optional identifier for the query
- `databases`: List of InterProScan databases to query
- `evalue`: BLAST E-value threshold (lower = more stringent)
- `max_hits`: Maximum number of BLAST hits to return

**Output:**
- **InterProScan Results**:
  - Functional domains with positions
  - Protein family classifications
  - Gene Ontology annotations
- **BLAST Results**:
  - Homologous proteins across species
  - Sequence identity and alignment statistics
  - Evolutionary relationships

### Analysis Strategy

This comprehensive approach provides:

1. **Structural Information** (InterProScan):
   - Domain architecture and organization
   - Functional motifs and active sites
   - Protein family membership

2. **Evolutionary Context** (BLAST):
   - Homologs in other species
   - Sequence conservation patterns
   - Potential orthologs and paralogs

3. **Functional Prediction**:
   - Combining domain and homology information
   - GO term annotations for molecular function
   - Biological process involvement

### Performance Notes

- **Total execution time**: 2-20 minutes depending on sequence length
  - InterProScan: 30 seconds to 15 minutes
  - BLAST: 10-90 seconds
  - Both run sequentially in this workflow
- **Timeout recommendation**: Set to at least 1200 seconds (20 minutes)
- **E-value tuning**: Use lower E-values (e.g., 1e-10) for highly conserved proteins, higher (e.g., 0.01) for divergent families

### Use Cases

- Complete functional annotation of unknown proteins
- Validate predicted protein functions
- Study protein evolution and conservation
- Identify potential drug targets
- Annotate proteomes and genome sequences
- Compare protein function across species

### Interpretation Tips

- **High domain coverage + high homology**: Well-characterized protein with known function
- **Domains but no homologs**: Novel protein with conserved domains, function can be inferred from domains
- **Homologs but no domains**: May need more sensitive domain detection or represents a novel fold
- **Neither domains nor homologs**: Potentially novel protein, may require experimental characterization

Related Skills

uniprot_deep_analysis

157
from InternScience/DrClaw

UniProt Deep Protein Analysis - Deep UniProt analysis: entry data, UniRef clusters, UniParc cross-references, and gene-centric view. Use this skill for protein science tasks involving get uniprotkb entry by accession get uniref cluster by id get uniparc entry by upi get gene centric by accession. Combines 4 tools from 1 SCP server(s).

proteome_analysis

157
from InternScience/DrClaw

Proteome-Level Analysis - Analyze at proteome level: get proteome from UniProt, gene-centric view, functional annotation from STRING. Use this skill for proteomics tasks involving get proteome by id get gene centric by proteome get functional annotation. Combines 3 tools from 2 SCP server(s).

protein_structure_analysis

157
from InternScience/DrClaw

Protein Structure Comprehensive Analysis - Comprehensive structure analysis: download PDB, extract chains, calculate geometry, quality metrics, and composition. Use this skill for structural biology tasks involving retrieve protein data by pdbcode extract pdb chains calculate pdb structural geometry calculate pdb quality metrics calculate pdb composition info. Combines 5 tools from 1 SCP server(s).

protein_solubility_optimization

157
from InternScience/DrClaw

Protein Solubility Optimization - Optimize protein solubility: calculate properties, predict solubility, predict hydrophilicity, and suggest mutations. Use this skill for protein engineering tasks involving calculate protein sequence properties predict protein function ComputeHydrophilicity zero shot sequence prediction. Combines 4 tools from 3 SCP server(s).

protein_similarity_search

157
from InternScience/DrClaw

Protein Similarity Search - Search for similar proteins: extract sequence from PDB, search structures with FoldSeek, find homologs with STRING, and check UniProt. Use this skill for bioinformatics tasks involving extract pdb sequence foldseek search get best similarity hits between species search uniprotkb entries. Combines 4 tools from 3 SCP server(s).

protein_quality_assessment

157
from InternScience/DrClaw

Protein Structure Quality Assessment - Assess structure quality: basic info, geometry analysis, quality metrics, composition, and visualization. Use this skill for structural biology tasks involving calculate pdb basic info calculate pdb structural geometry calculate pdb quality metrics calculate pdb composition info visualize protein. Combines 5 tools from 1 SCP server(s).

protein_property_comparison

157
from InternScience/DrClaw

Cross-Species Protein Comparison - Compare proteins across species: get orthologs from NCBI, compute properties for each, and compare similarity. Use this skill for comparative biology tasks involving get gene orthologs calculate protein sequence properties calculate smiles similarity get homology id. Combines 4 tools from 3 SCP server(s).

protein_interaction_network

157
from InternScience/DrClaw

Protein Interaction Network Analysis - Build protein interaction network: map identifiers with STRING, get PPI network, compute enrichment, and link to KEGG pathways. Use this skill for systems biology tasks involving mapping identifiers get string network interaction get ppi enrichment kegg link. Combines 4 tools from 2 SCP server(s).

protein_function_annotation

157
from InternScience/DrClaw

Protein Function Annotation Pipeline - Annotate protein function: UniProt metadata, InterPro domains, functional prediction, and GO enrichment. Use this skill for proteomics tasks involving query uniprot query interpro predict protein function get functional enrichment. Combines 4 tools from 2 SCP server(s).

protein_engineering

157
from InternScience/DrClaw

Protein Engineering Workflow - Engineer a protein: predict structure, identify functional residues, predict beneficial mutations, and calculate properties. Use this skill for protein engineering tasks involving Protein structure prediction ESMFold predict functional residue zero shot sequence prediction calculate protein sequence properties. Combines 4 tools from 2 SCP server(s).

protein_database_crossref

157
from InternScience/DrClaw

Protein Cross-Database Reference - Cross-reference protein: UniProt entry, NCBI gene, Ensembl xrefs, and PDB structure search. Use this skill for proteomics tasks involving get uniprotkb entry by accession get gene metadata by gene name get xrefs symbol retrieve protein data by pdbcode. Combines 4 tools from 4 SCP server(s).

protein_complex_analysis

157
from InternScience/DrClaw

Protein Complex Visualization & Analysis - Analyze protein complex: download structure, visualize complex, extract chains, and calculate quality metrics. Use this skill for structural biology tasks involving retrieve protein data by pdbcode visualize complex extract pdb chains calculate pdb basic info. Combines 4 tools from 1 SCP server(s).