uniprot-database

Direct REST API access to UniProt for protein search, entry retrieval, and identifier mapping; use when you need programmatic UniProtKB queries or cross-database ID conversion.

53 stars

byaipoch

View on GitHub Installation ↓

Best use case

uniprot-database is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Direct REST API access to UniProt for protein search, entry retrieval, and identifier mapping; use when you need programmatic UniProtKB queries or cross-database ID conversion.

Teams using uniprot-database should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/uniprot-database/SKILL.md --create-dirs "https://raw.githubusercontent.com/aipoch/medical-research-skills/main/scientific-skills/Evidence Insight/uniprot-database/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/uniprot-database/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How uniprot-database Compares

Feature / Agent	uniprot-database	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Direct REST API access to UniProt for protein search, entry retrieval, and identifier mapping; use when you need programmatic UniProtKB queries or cross-database ID conversion.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You need to search UniProtKB with Lucene-style queries (e.g., by gene name, organism, reviewed status).
- You want to fetch the full details of a specific protein entry by UniProt accession (e.g., `P12345`).
- You need to map identifiers between databases (e.g., gene names, Ensembl IDs, RefSeq IDs ↔ UniProt accessions).
- You are building pipelines that require automated protein annotation retrieval in JSON/TSV/FASTA formats.
- You need a lightweight client that talks directly to UniProt’s REST API without additional SDKs.

## Key Features

- **Protein search** via UniProtKB REST endpoint using Lucene query syntax.
- **Entry retrieval** by accession with selectable output formats.
- **Identifier mapping** between supported source/target databases using UniProt ID mapping service.
- **Format control** (default `json`) for consistent downstream parsing.
- **Reference docs** for query syntax and available API fields:
  - `references/query_syntax.md`
  - `references/api_fields.md`

## Dependencies

- Python `>=3.8`
- `requests >=2.31.0`

## Example Usage

```python
import time
import requests

BASE = "https://rest.uniprot.org"

def search_protein(query: str, fmt: str = "json", size: int = 5):
    """
    Search UniProtKB using Lucene-style query syntax.
    """
    url = f"{BASE}/uniprotkb/search"
    params = {"query": query, "format": fmt, "size": size}
    r = requests.get(url, params=params, timeout=30)
    r.raise_for_status()
    return r.json() if fmt == "json" else r.text

def retrieve_entry(accession: str, fmt: str = "json"):
    """
    Retrieve a UniProtKB entry by accession.
    """
    url = f"{BASE}/uniprotkb/{accession}"
    params = {"format": fmt}
    r = requests.get(url, params=params, timeout=30)
    r.raise_for_status()
    return r.json() if fmt == "json" else r.text

def id_mapping(from_db: str, to_db: str, ids, poll_interval_s: float = 1.0):
    """
    Map identifiers using UniProt ID Mapping.
    ids can be a list of strings or a comma-separated string.
    """
    if isinstance(ids, (list, tuple)):
        ids = ",".join(ids)

    # 1) Submit mapping job
    submit_url = f"{BASE}/idmapping/run"
    r = requests.post(
        submit_url,
        data={"from": from_db, "to": to_db, "ids": ids},
        timeout=30,
    )
    r.raise_for_status()
    job_id = r.json()["jobId"]

    # 2) Poll job status
    status_url = f"{BASE}/idmapping/status/{job_id}"
    while True:
        s = requests.get(status_url, timeout=30)
        s.raise_for_status()
        payload = s.json()
        if payload.get("jobStatus") in (None, "FINISHED"):
            break
        if payload.get("jobStatus") == "FAILED":
            raise RuntimeError(f"ID mapping failed: {payload}")
        time.sleep(poll_interval_s)

    # 3) Fetch results (JSON)
    results_url = f"{BASE}/idmapping/results/{job_id}"
    res = requests.get(results_url, params={"format": "json"}, timeout=30)
    res.raise_for_status()
    return res.json()

if __name__ == "__main__":
    # Search example: human BRCA1
    search = search_protein("gene:BRCA1 AND organism_id:9606", size=3)
    print("Search results (first accessions):",
          [item["primaryAccession"] for item in search.get("results", [])])

    # Retrieve entry example
    entry = retrieve_entry("P38398")  # UniProt accession for human BRCA1 (example)
    print("Entry primaryAccession:", entry.get("primaryAccession"))
    print("Protein name:", entry.get("proteinDescription", {}).get("recommendedName", {}).get("fullName", {}).get("value"))

    # ID mapping example: gene name -> UniProtKB
    mapping = id_mapping(from_db="Gene_Name", to_db="UniProtKB", ids=["BRCA1"])
    print("Mapping results keys:", mapping.keys())
```

## Implementation Details

- **Search Protein**
  - Uses `GET /uniprotkb/search`
  - Key parameters:
    - `query`: Lucene-style query string (see `references/query_syntax.md`)
    - `format`: output format (default `json`)
    - Optional common parameters: `size`, `fields`, `sort`
  - Returns parsed JSON when `format=json`, otherwise raw text.

- **Retrieve Entry**
  - Uses `GET /uniprotkb/{accession}`
  - Key parameters:
    - `accession`: UniProt accession (e.g., `P12345`)
    - `format`: output format (default `json`)
  - Suitable for fetching full record details for a known accession.

- **ID Mapping**
  - Uses UniProt asynchronous mapping workflow:
    1. `POST /idmapping/run` with `from`, `to`, `ids`
    2. Poll `GET /idmapping/status/{jobId}` until finished
    3. Fetch `GET /idmapping/results/{jobId}?format=json`
  - `ids` accepts either a list or a comma-separated string.
  - Recommended parameters:
    - `poll_interval_s`: controls polling frequency to avoid excessive requests.
  - `from_db` / `to_db` must match UniProt-supported database identifiers (consult UniProt mapping documentation as needed).

Related Skills

uspto-database

from aipoch/medical-research-skills

Access USPTO data (Patent Search, PEDS, TSDR, assignments) when you need to query patents/trademarks and retrieve prosecution or status information programmatically.

zinc-database

from aipoch/medical-research-skills

Access the ZINC (230M+ purchasable compounds) database when you need to look up compounds by ZINC ID/SMILES, run similarity/analog searches, or download 3D ready-to-dock structures for virtual screening and drug discovery.

string-database

from aipoch/medical-research-skills

Access the STRING database to map identifiers, retrieve protein–protein interaction networks, and run functional/PPI enrichment when you need interaction context for a gene/protein set.

semantic-scholar-database

from aipoch/medical-research-skills

Access the Semantic Scholar Graph API to search papers and retrieve paper/author/citation data when you need literature discovery or citation graph exploration.

scite-database

from aipoch/medical-research-skills

Access Scite.ai Smart Citations to classify how a paper is cited (supporting, contrasting, mentioning) and assess scientific claims; use it when you need to evaluate a paper’s reliability or its acceptance in the literature.

pubchem-database-skill

from aipoch/medical-research-skills

Programmatic access to the PubChem database (via PUG-REST API and PubChemPy) for searching chemical compounds, retrieving physicochemical properties, performing structure similarity/substructure searches, and obtaining bioactivity data.

pdb-database

from aipoch/medical-research-skills

Access the RCSB Protein Data Bank (PDB) to search, download, and programmatically retrieve 3D macromolecular structures and metadata; use when you need structure discovery (text/sequence/3D similarity) or automated structural data ingestion for structural biology and drug discovery workflows.

kegg-database

from aipoch/medical-research-skills

Direct access to KEGG via the REST API for academic-only pathway/gene/compound/drug queries; use when you need precise HTTP-level control or targeted KEGG ID mapping.

hmdb-database

from aipoch/medical-research-skills

Access the Human Metabolome Database (HMDB) to search metabolites by name/structure/ID and extract chemical/biological/clinical fields when you need metabolomics research data or automated HMDB XML mining.

gwas-database

from aipoch/medical-research-skills

Query the NHGRI-EBI GWAS Catalog to retrieve SNP–trait associations, study metadata, and (when available) summary statistics when you need evidence for a variant, trait/disease, gene, or genomic region.

gene-database

from aipoch/medical-research-skills

Query the NCBI Gene database via E-utilities and the NCBI Datasets API; use it when you need to search genes by symbol/ID and retrieve annotations (RefSeq, GO, location, phenotype) for single or batch gene lists.

fda-database

from aipoch/medical-research-skills

Query the openFDA API to retrieve FDA regulatory datasets (drugs, devices, adverse events, recalls, submissions, UNII) when you need programmatic safety/regulatory evidence for analysis or research.