pubchem-database-skill

Programmatic access to the PubChem database (via PUG-REST API and PubChemPy) for searching chemical compounds, retrieving physicochemical properties, performing structure similarity/substructure searches, and obtaining bioactivity data.

53 stars

byaipoch

View on GitHub Installation ↓

Best use case

pubchem-database-skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using pubchem-database-skill should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/pubchem-database-skill/SKILL.md --create-dirs "https://raw.githubusercontent.com/aipoch/medical-research-skills/main/scientific-skills/Evidence Insight/pubchem-database-skill/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/pubchem-database-skill/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How pubchem-database-skill Compares

Feature / Agent	pubchem-database-skill	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)


## When to Use

* You need to **search for chemical compounds** by name, CID, SMILES, InChI, or molecular formula.
* You want to **retrieve physicochemical properties** (e.g., molecular weight, LogP, TPSA, H-bond donors/acceptors).
* You need to **perform structure-based searches**, such as similarity or substructure queries.
* You want to **obtain bioactivity data** (e.g., assay summaries, target information) for a given compound.
* You are building an automated cheminformatics or drug discovery workflow that requires **programmatic access to PubChem**.

## Key Features

* **Flexible compound search** by name, CID, SMILES, InChI, or formula.
* **Property retrieval** via PubChem PUG-REST and PubChemPy (e.g., MW, LogP, Canonical SMILES).
* **Structure search**:

  * Similarity search
  * Substructure search
* **Bioactivity retrieval** linked to PubChem BioAssay records.
* **Rate-limit aware implementation** (respects PubChem’s limit of max 5 requests/sec).
* **Python function interface** for seamless integration into scientific pipelines.

## Dependencies

Install the required Python packages:

```bash
uv pip install pubchempy requests
```

* `pubchempy` (version: not pinned)
* `requests` (version: not pinned)

## Example Usage

Primary module:

* `scripts/pubchem_ops.py`

### 1) Get compound properties

```bash
python -c "from scripts.pubchem_ops import get_properties; print(get_properties(query_value='Aspirin', query_type='name'))"
```

Or in Python:

```python
from scripts.pubchem_ops import get_properties

result = get_properties(query_value="Aspirin", query_type="name")
print(result)
```

### 2) Structure search (similarity)

```bash
python -c "from scripts.pubchem_ops import structure_search; print(structure_search(query_value='CC(=O)OC1=CC=CC=C1C(=O)O', search_type='similarity'))"
```

Or in Python:

```python
from scripts.pubchem_ops import structure_search

smiles = "CC(=O)OC1=CC=CC=C1C(=O)O"
result = structure_search(query_value=smiles, search_type="similarity")
print(result)
```

### 3) Get bioactivity data

```bash
python -c "from scripts.pubchem_ops import get_bioactivity; print(get_bioactivity(cid=2244))"
```

Or in Python:

```python
from scripts.pubchem_ops import get_bioactivity

result = get_bioactivity(cid=2244)
print(result)
```

## Implementation Details

* **Primary script**: `scripts/pubchem_ops.py`
* **Data sources / endpoints**:

  * Compound & properties: `pubchem.ncbi.nlm.nih.gov/rest/pug`
  * Bioactivity: PubChem BioAssay endpoints
  * Python wrapper: `PubChemPy`
* **Supported operations**:

  * `get_properties`: retrieve physicochemical properties by name/CID/SMILES/InChI/formula.
  * `structure_search`: perform similarity or substructure search.
  * `get_bioactivity`: retrieve assay and bioactivity-related data by CID.
* **Input constraints**:

  * `query_type` must match supported types (e.g., `name`, `cid`, `smiles`, `inchi`, `formula`).
  * `search_type` must be `similarity` or `substructure`.
* **Error handling**:

  * Returns structured error or `None` if compound is not found.
  * Handles PubChem rate limits (≤ 5 requests/sec).
* **Troubleshooting considerations**:

  * Ensure network access to `pubchem.ncbi.nlm.nih.gov`.
  * Verify query format (e.g., valid SMILES or InChI) if results are empty.
* **Additional reference**:

  * API documentation pointers: `references/api_reference.md`

Related Skills

uspto-database

from aipoch/medical-research-skills

Access USPTO data (Patent Search, PEDS, TSDR, assignments) when you need to query patents/trademarks and retrieve prosecution or status information programmatically.

zinc-database

from aipoch/medical-research-skills

Access the ZINC (230M+ purchasable compounds) database when you need to look up compounds by ZINC ID/SMILES, run similarity/analog searches, or download 3D ready-to-dock structures for virtual screening and drug discovery.

uniprot-database

from aipoch/medical-research-skills

Direct REST API access to UniProt for protein search, entry retrieval, and identifier mapping; use when you need programmatic UniProtKB queries or cross-database ID conversion.

string-database

from aipoch/medical-research-skills

Access the STRING database to map identifiers, retrieve protein–protein interaction networks, and run functional/PPI enrichment when you need interaction context for a gene/protein set.

semantic-scholar-database

from aipoch/medical-research-skills

Access the Semantic Scholar Graph API to search papers and retrieve paper/author/citation data when you need literature discovery or citation graph exploration.

scite-database

from aipoch/medical-research-skills

Access Scite.ai Smart Citations to classify how a paper is cited (supporting, contrasting, mentioning) and assess scientific claims; use it when you need to evaluate a paper’s reliability or its acceptance in the literature.

pdb-database

from aipoch/medical-research-skills

Access the RCSB Protein Data Bank (PDB) to search, download, and programmatically retrieve 3D macromolecular structures and metadata; use when you need structure discovery (text/sequence/3D similarity) or automated structural data ingestion for structural biology and drug discovery workflows.

kegg-database

from aipoch/medical-research-skills

Direct access to KEGG via the REST API for academic-only pathway/gene/compound/drug queries; use when you need precise HTTP-level control or targeted KEGG ID mapping.

hmdb-database

from aipoch/medical-research-skills

Access the Human Metabolome Database (HMDB) to search metabolites by name/structure/ID and extract chemical/biological/clinical fields when you need metabolomics research data or automated HMDB XML mining.

gwas-database

from aipoch/medical-research-skills

Query the NHGRI-EBI GWAS Catalog to retrieve SNP–trait associations, study metadata, and (when available) summary statistics when you need evidence for a variant, trait/disease, gene, or genomic region.

gene-database

from aipoch/medical-research-skills

Query the NCBI Gene database via E-utilities and the NCBI Datasets API; use it when you need to search genes by symbol/ID and retrieve annotations (RefSeq, GO, location, phenotype) for single or batch gene lists.

fda-database

from aipoch/medical-research-skills

Query the openFDA API to retrieve FDA regulatory datasets (drugs, devices, adverse events, recalls, submissions, UNII) when you need programmatic safety/regulatory evidence for analysis or research.