interproscan-domain-analysis

Analyze protein sequences using InterProScan to identify functional domains, protein families, and Gene Ontology (GO) annotations.

370 stars

bySpectrAI-Initiative

View on GitHub Installation ↓

Best use case

interproscan-domain-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Analyze protein sequences using InterProScan to identify functional domains, protein families, and Gene Ontology (GO) annotations.

Teams using interproscan-domain-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/interproscan-domain-analysis/SKILL.md --create-dirs "https://raw.githubusercontent.com/SpectrAI-Initiative/InnoClaw/main/.claude/skills/interproscan-domain-analysis/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/interproscan-domain-analysis/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How interproscan-domain-analysis Compares

Feature / Agent	interproscan-domain-analysis	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Analyze protein sequences using InterProScan to identify functional domains, protein families, and Gene Ontology (GO) annotations.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# InterProScan Protein Domain Analysis

## Usage

### 1. MCP Server Definition

Use the same `BioInfoToolsClient` class as defined in the protein-blast-search skill.

### 2. InterProScan Domain Analysis Workflow

This workflow analyzes protein sequences using InterProScan to identify functional domains, protein families, binding sites, and associated Gene Ontology annotations.

**Workflow Steps:**

1. **Validate Sequence** - Check protein sequence format and length
2. **Run InterProScan** - Identify domains using multiple signature databases
3. **Extract Annotations** - Parse domain locations, families, and GO terms

**Implementation:**

```python
from datetime import timedelta

## Initialize client
client = BioInfoToolsClient(
    "https://scp.intern-ai.org.cn/api/v1/mcp/17/BioInfo-Tools",
    "<your-api-key>"
)

if not await client.connect():
    print("connection failed")
    exit()

## Input: Protein sequence to analyze
protein_sequence = """
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
"""

## Step 1 & 2: Run InterProScan analysis
result = await client.session.call_tool(
    "interproscan_analyze",
    arguments={
        "sequence": protein_sequence.strip(),
        "sequence_id": "HBB_HUMAN",        # Optional identifier
        "databases": ["Pfam"],              # Signature databases to use
        "goterms": True                     # Include GO term annotations
    },
    read_timeout_seconds=timedelta(seconds=900)  # Allow up to 15 minutes
)

## Step 3: Parse and display results
result_data = client.parse_result(result)

if result_data.get("success"):
    results = result_data.get("results", {})
    domains = results.get("domains", [])
    go_terms = results.get("go_terms", [])

    print(f"✅ InterProScan analysis completed successfully")
    print(f"Execution time: {result_data.get('time_seconds', '?')} seconds")
    print(f"Domains found: {len(domains)}")
    print(f"GO annotations: {len(go_terms)}\n")

    # Display domain information
    if domains:
        print("=== Functional Domains ===\n")
        for i, domain in enumerate(domains, 1):
            print(f"{i}. {domain.get('name', 'N/A')}")
            print(f"   Accession: {domain.get('accession', 'N/A')}")
            print(f"   Database: {domain.get('database', 'N/A')}")
            if domain.get('description'):
                print(f"   Description: {domain.get('description')}")

            # Display domain locations
            locations = domain.get('locations', [])
            if locations:
                print(f"   Locations:")
                for loc in locations:
                    print(f"     - Position {loc.get('start')}-{loc.get('end')} aa")
                    if loc.get('score'):
                        print(f"       Score: {loc.get('score')}")
            print()

    # Display GO annotations
    if go_terms:
        print("=== Gene Ontology Annotations ===\n")

        # Group by category
        by_category = {}
        for go in go_terms:
            category = go.get('category', 'UNKNOWN')
            if category not in by_category:
                by_category[category] = []
            by_category[category].append(go)

        for category, terms in by_category.items():
            print(f"{category}:")
            for go in terms:
                print(f"  - {go.get('id', 'N/A')}: {go.get('name', 'N/A')}")
            print()
else:
    print(f"❌ InterProScan analysis failed: {result_data.get('error', 'Unknown error')}")

await client.disconnect()
```

### Tool Descriptions

**BioInfo-Tools Server:**
- `interproscan_analyze`: Analyze protein sequence using InterProScan
  - Args:
    - `sequence` (str): Protein sequence in amino acid single-letter code
    - `sequence_id` (str, optional): Identifier for the query sequence
    - `databases` (list, optional): Signature databases to query (default: ["Pfam"])
    - `goterms` (bool, optional): Include GO term annotations (default: True)
  - Returns:
    - `success` (bool): Whether analysis completed successfully
    - `results` (dict): Analysis results containing domains and GO terms
    - `time_seconds` (float): Execution time

### Input/Output

**Input:**
- `sequence`: Protein sequence (amino acid single-letter code)
- `sequence_id`: Optional identifier for the query
- `databases`: List of signature databases (e.g., ["Pfam", "SMART", "PRINTS"])
- `goterms`: Whether to include Gene Ontology annotations

**Output:**
- `domains`: List of identified protein domains, each containing:
  - `name`: Domain or family name
  - `accession`: Database accession number
  - `database`: Source database (e.g., "PFAM", "SMART")
  - `description`: Functional description
  - `locations`: List of domain positions in the sequence
    - `start`: Start position (amino acid number)
    - `end`: End position (amino acid number)
    - `score`: Match score (if available)
- `go_terms`: List of GO annotations, each containing:
  - `id`: GO identifier (e.g., "GO:0020037")
  - `name`: GO term name
  - `category`: GO category (MOLECULAR_FUNCTION, BIOLOGICAL_PROCESS, or CELLULAR_COMPONENT)

### Available Signature Databases

InterProScan integrates multiple signature databases:
- **Pfam**: Protein families based on HMMs
- **SMART**: Simple Modular Architecture Research Tool
- **PRINTS**: Protein fingerprints
- **ProSite**: Protein domains, families, and functional sites
- **SUPERFAMILY**: Structural and functional annotation
- And more...

Default: `["Pfam"]` for fastest results

### Performance Notes

- **Typical execution time**:
  - Short sequences (~150 aa): 30-60 seconds
  - Medium sequences (~400 aa): 2-4 minutes
  - Long sequences (~800+ aa): 5-15 minutes
- **Timeout recommendation**: Set to at least 900 seconds (15 minutes)
- **Multiple databases**: Using more databases increases execution time but provides comprehensive annotation

### Use Cases

- Identify functional domains in novel protein sequences
- Predict protein function from domain composition
- Locate active sites and binding regions
- Annotate protein families and superfamilies
- Obtain GO term annotations for functional analysis
- Compare domain architecture across homologous proteins

### GO Term Categories

- **MOLECULAR_FUNCTION**: Molecular-level activities (e.g., "heme binding", "catalytic activity")
- **BIOLOGICAL_PROCESS**: Biological pathways and processes (e.g., "oxygen transport", "signal transduction")
- **CELLULAR_COMPONENT**: Cellular locations (e.g., "cytoplasm", "membrane")

Related Skills

uniprot_deep_analysis

370

from SpectrAI-Initiative/InnoClaw

UniProt Deep Protein Analysis - Deep UniProt analysis: entry data, UniRef clusters, UniParc cross-references, and gene-centric view. Use this skill for protein science tasks involving get uniprotkb entry by accession get uniref cluster by id get uniparc entry by upi get gene centric by accession. Combines 4 tools from 1 SCP server(s).

transcriptome_analysis

370

from SpectrAI-Initiative/InnoClaw

Transcriptome Analysis Pipeline - Analyze transcriptome: Ensembl transcript lookup, sequence retrieval, haplotype analysis, and UCSC track data. Use this skill for transcriptomics tasks involving get lookup id get sequence id get transcript haplotypes get track data. Combines 4 tools from 2 SCP server(s).

tissue_specific_analysis

370

from SpectrAI-Initiative/InnoClaw

Tissue-Specific Expression Analysis - Analyze tissue-specific expression: ChEMBL tissue data, TCGA cancer expression, Ensembl gene info, and NCBI gene data. Use this skill for tissue biology tasks involving get tissue by id get gene expression across cancers get lookup symbol get gene metadata by gene name. Combines 4 tools from 4 SCP server(s).

thermal_analysis

370

from SpectrAI-Initiative/InnoClaw

Thermal & Heat Transfer Analysis - Analyze thermal system: calculate heat released, convert energy units, compute potential energy, and dynamic viscosity. Use this skill for thermal engineering tasks involving calculate heat released convert energy MeV to J calculate potential energy calculate dynamic viscosity. Combines 4 tools from 1 SCP server(s).

statistical_error_analysis

370

from SpectrAI-Initiative/InnoClaw

Statistical Error Analysis - Analyze measurement errors: absolute error, scientific notation, max value, mean square, and formatting. Use this skill for statistics tasks involving calculate absolute error convert to scientific notation calculate max value calculate mean square format scientific notation. Combines 5 tools from 1 SCP server(s).

snp_functional_analysis

370

from SpectrAI-Initiative/InnoClaw

SNP Functional Impact Analysis - Analyze SNP function: VEP prediction, variation details, phenotype association, and literature evidence. Use this skill for functional genomics tasks involving get vep id get variation get phenotype accession pubmed search. Combines 4 tools from 2 SCP server(s).

smiles_comprehensive_analysis

370

from SpectrAI-Initiative/InnoClaw

SMILES Comprehensive Analysis - Comprehensive SMILES analysis: validate, convert name, compute all molecular descriptors, and predict ADMET. Use this skill for cheminformatics tasks involving is valid smiles ChemicalStructureAnalyzer calculate mol basic info pred molecule admet. Combines 4 tools from 3 SCP server(s).

regulatory_region_analysis

370

from SpectrAI-Initiative/InnoClaw

Regulatory Region Analysis - Analyze regulatory regions: get overlapping features, binding matrix, sequence, and phenotype associations. Use this skill for epigenomics tasks involving get overlap region get species binding matrix get sequence get phenotype region. Combines 4 tools from 2 SCP server(s).

proteome_analysis

370

from SpectrAI-Initiative/InnoClaw

Proteome-Level Analysis - Analyze at proteome level: get proteome from UniProt, gene-centric view, functional annotation from STRING. Use this skill for proteomics tasks involving get proteome by id get gene centric by proteome get functional annotation. Combines 3 tools from 2 SCP server(s).

protein_structure_analysis

370

from SpectrAI-Initiative/InnoClaw

Protein Structure Comprehensive Analysis - Comprehensive structure analysis: download PDB, extract chains, calculate geometry, quality metrics, and composition. Use this skill for structural biology tasks involving retrieve protein data by pdbcode extract pdb chains calculate pdb structural geometry calculate pdb quality metrics calculate pdb composition info. Combines 5 tools from 1 SCP server(s).

protein_complex_analysis

370

from SpectrAI-Initiative/InnoClaw

Protein Complex Visualization & Analysis - Analyze protein complex: download structure, visualize complex, extract chains, and calculate quality metrics. Use this skill for structural biology tasks involving retrieve protein data by pdbcode visualize complex extract pdb chains calculate pdb basic info. Combines 4 tools from 1 SCP server(s).

protein_classification_analysis

370

from SpectrAI-Initiative/InnoClaw

Protein Classification Analysis - Classify protein: ChEMBL protein classification, UniProt entry, InterPro domains, and Ensembl biotypes. Use this skill for protein science tasks involving search protein classification get uniprotkb entry by accession query interpro get info biotypes. Combines 4 tools from 4 SCP server(s).