string-database
Access the STRING database to map identifiers, retrieve protein–protein interaction networks, and run functional/PPI enrichment when you need interaction context for a gene/protein set.
Best use case
string-database is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Access the STRING database to map identifiers, retrieve protein–protein interaction networks, and run functional/PPI enrichment when you need interaction context for a gene/protein set.
Teams using string-database should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/string-database/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How string-database Compares
| Feature / Agent | string-database | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Access the STRING database to map identifiers, retrieve protein–protein interaction networks, and run functional/PPI enrichment when you need interaction context for a gene/protein set.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
## When to Use
- You have gene symbols (e.g., `TP53`) and need to resolve them to STRING protein identifiers for downstream analysis.
- You want to retrieve a protein–protein interaction (PPI) network (functional/physical) with confidence scores for one or more proteins.
- You need to find interaction partners for a target protein to expand a candidate list (e.g., add top N neighbors).
- You want to perform functional enrichment (GO/KEGG/Reactome, etc.) for a protein set to interpret biological themes.
- You need a quick static visualization (PNG/SVG) of a STRING network for reports or notebooks.
## Key Features
- **ID Mapping**: Convert gene/protein names to STRING identifiers for a given organism.
- **Network Retrieval**: Fetch interaction edges with confidence scores from STRING.
- **Interaction Partners**: Expand a protein list by retrieving interaction partners.
- **Enrichment Analysis**:
- Functional enrichment (e.g., GO, KEGG, Reactome)
- PPI enrichment statistics
- Functional annotations (e.g., PFAM/SMART where supported by STRING endpoints)
- **Visualization**: Download static network images (PNG/SVG).
## Dependencies
- Python `>=3.8`
- `requests` (tested with `>=2.28`)
- `pandas` (tested with `>=1.5`)
Install:
```bash
pip install requests pandas
```
## Example Usage
```python
from scripts.string_api import StringClient
def main():
# STRING does not require a secret API key, but providing a caller identity is recommended.
client = StringClient(caller_identity="my_analysis_tool")
# 1) Map an identifier (e.g., TP53 in Homo sapiens; NCBI taxonomy ID 9606)
protein_id = client.map_id(identifier="TP53", species=9606)
print("Mapped ID:", protein_id)
# 2) Download a network image and expand by adding interaction partners
client.get_network_image(
identifiers=[protein_id],
output_file="tp53_network.png",
add_color_nodes=10, # add 10 partners
)
print("Saved network image to tp53_network.png")
# 3) Run PPI enrichment for the set
ppi_stats = client.get_ppi_enrichment(identifiers=[protein_id])
print("PPI enrichment:", ppi_stats)
if __name__ == "__main__":
main()
```
## Implementation Details
- **Client entry point**: `scripts/string_api.py` provides the main wrapper (e.g., `StringClient`) around the STRING REST API.
- **Caller identity**:
- STRING endpoints do **not** require an API key.
- A `caller_identity` string is strongly recommended (project name/email/URL) to support rate/load management.
- Pass it at initialization (e.g., `StringClient(caller_identity="my_email@example.com")`) or inject via environment variables in your own wrapper.
- **Organism selection**:
- Most operations require a species identifier (commonly NCBI taxonomy ID, e.g., `9606` for human).
- **Network retrieval and scoring**:
- Network endpoints return interactions with confidence scores; downstream filtering is typically done by applying a score threshold in your analysis code (if exposed by the wrapper).
- **Visualization**:
- Static images are retrieved directly from STRING image endpoints and written to disk (PNG/SVG depending on the method/parameters).
- **Reference documentation**:
- See `references/string_reference.md` for original API notes and endpoint details included with this skill.Related Skills
uspto-database
Access USPTO data (Patent Search, PEDS, TSDR, assignments) when you need to query patents/trademarks and retrieve prosecution or status information programmatically.
zinc-database
Access the ZINC (230M+ purchasable compounds) database when you need to look up compounds by ZINC ID/SMILES, run similarity/analog searches, or download 3D ready-to-dock structures for virtual screening and drug discovery.
uniprot-database
Direct REST API access to UniProt for protein search, entry retrieval, and identifier mapping; use when you need programmatic UniProtKB queries or cross-database ID conversion.
semantic-scholar-database
Access the Semantic Scholar Graph API to search papers and retrieve paper/author/citation data when you need literature discovery or citation graph exploration.
scite-database
Access Scite.ai Smart Citations to classify how a paper is cited (supporting, contrasting, mentioning) and assess scientific claims; use it when you need to evaluate a paper’s reliability or its acceptance in the literature.
pubchem-database-skill
Programmatic access to the PubChem database (via PUG-REST API and PubChemPy) for searching chemical compounds, retrieving physicochemical properties, performing structure similarity/substructure searches, and obtaining bioactivity data.
pdb-database
Access the RCSB Protein Data Bank (PDB) to search, download, and programmatically retrieve 3D macromolecular structures and metadata; use when you need structure discovery (text/sequence/3D similarity) or automated structural data ingestion for structural biology and drug discovery workflows.
kegg-database
Direct access to KEGG via the REST API for academic-only pathway/gene/compound/drug queries; use when you need precise HTTP-level control or targeted KEGG ID mapping.
hmdb-database
Access the Human Metabolome Database (HMDB) to search metabolites by name/structure/ID and extract chemical/biological/clinical fields when you need metabolomics research data or automated HMDB XML mining.
gwas-database
Query the NHGRI-EBI GWAS Catalog to retrieve SNP–trait associations, study metadata, and (when available) summary statistics when you need evidence for a variant, trait/disease, gene, or genomic region.
gene-database
Query the NCBI Gene database via E-utilities and the NCBI Datasets API; use it when you need to search genes by symbol/ID and retrieve annotations (RefSeq, GO, location, phenotype) for single or batch gene lists.
fda-database
Query the openFDA API to retrieve FDA regulatory datasets (drugs, devices, adverse events, recalls, submissions, UNII) when you need programmatic safety/regulatory evidence for analysis or research.