gene-database

Query the NCBI Gene database via E-utilities and the NCBI Datasets API; use it when you need to search genes by symbol/ID and retrieve annotations (RefSeq, GO, location, phenotype) for single or batch gene lists.

53 stars

byaipoch

View on GitHub Installation ↓

Best use case

gene-database is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using gene-database should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/gene-database/SKILL.md --create-dirs "https://raw.githubusercontent.com/aipoch/medical-research-skills/main/scientific-skills/Evidence Insight/gene-database/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/gene-database/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How gene-database Compares

Feature / Agent	gene-database	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You have a gene symbol (e.g., **BRCA1**) and need the correct **NCBI Gene ID** for a specific organism.
- You have an NCBI **Gene ID** and need consolidated metadata (aliases, RefSeq accessions, genomic location, GO, literature links).
- You need to **annotate a gene panel** (dozens to thousands of genes) with consistent identifiers and core annotations.
- You want to search genes by **biological context** (GO terms, phenotype/disease keywords, pathway terms) and then retrieve details for the hits.
- You are building a pipeline that must respect **NCBI rate limits** and handle retries for transient API failures.

## Key Features

- Symbol/name search with organism scoping using **E-utilities (ESearch)**.
- Gene record retrieval by ID using **E-utilities (EFetch/ESummary)** in JSON/XML/text-oriented outputs.
- Streamlined, gene-focused retrieval using the **NCBI Datasets API** (metadata + sequences/links in a single workflow).
- Batch lookup utilities with basic rate-limit awareness and output aggregation.
- Supports common annotation fields: nomenclature/aliases, RefSeq transcripts/proteins, genomic location, GO annotations, phenotype/disease keywords, and related literature references.

## Dependencies

- Python **3.9+**
- `requests` **>= 2.28**
- NCBI E-utilities (Entrez) HTTP API (public service)
- NCBI Datasets HTTP API (public service)
- Optional: NCBI API key (recommended for higher throughput)

## Example Usage

> The following examples assume the repository provides these scripts:
> - `scripts/query_gene.py`
> - `scripts/fetch_gene_data.py`
> - `scripts/batch_gene_lookup.py`

### 1) Search by symbol/name (E-utilities / ESearch)

```bash
python scripts/query_gene.py --search "BRCA1" --organism "human"
```

Example advanced query strings:

```bash
python scripts/query_gene.py --search "insulin[gene name] AND human[organism]"
python scripts/query_gene.py --search "dystrophin[gene name] AND muscular dystrophy[disease]"
python scripts/query_gene.py --search "human[organism] AND 17q21[chromosome]"
```

### 2) Retrieve gene information by Gene ID

Using E-utilities (format-oriented retrieval):

```bash
python scripts/query_gene.py --id 672 --format json
```

Using NCBI Datasets API (consolidated gene payload):

```bash
python scripts/fetch_gene_data.py --gene-id 672
```

Or by symbol + taxon:

```bash
python scripts/fetch_gene_data.py --symbol BRCA1 --taxon human
python scripts/fetch_gene_data.py --symbol TP53 --taxon "Homo sapiens" --output json
```

### 3) Batch lookup for gene list annotation

From a file of symbols (organism required for symbol disambiguation):

```bash
python scripts/batch_gene_lookup.py --file gene_list.txt --organism human
```

From a comma-separated list of Gene IDs:

```bash
python scripts/batch_gene_lookup.py --ids 672,7157,5594 --output results.json
```

## Implementation Details

### API selection guidance

- Use **E-utilities** when you need:
  - complex Entrez query syntax (fielded queries, boolean logic),
  - cross-database patterns,
  - fine control over search and retrieval steps (ESearch → ESummary/EFetch).
- Use **NCBI Datasets API** when you need:
  - a streamlined gene-centric retrieval path,
  - consolidated metadata (and often sequence-related links) with fewer round trips.

### Query patterns (E-utilities)

Typical fielded query components include:
- `"<SYMBOL>"` plus organism scoping: `BRCA1[gene name] AND human[organism]`
- GO term searches (example): `GO:0006915[biological process]`
- Phenotype/disease keywords (example): `diabetes[phenotype] AND mouse[organism]`
- Pathway keywords (example): `insulin signaling pathway[pathway]`

### Rate limits and API keys

- Without an API key (typical defaults):
  - E-utilities: ~**3 requests/sec**
  - Datasets API: ~**5 requests/sec**
- With an NCBI API key:
  - both can be used up to ~**10 requests/sec** (service-dependent)

Obtain an API key from: https://www.ncbi.nlm.nih.gov/account/

### Error handling recommendations

- Handle standard HTTP errors:
  - **400**: invalid/malformed query or parameters
  - **404**: Gene ID not found
  - **429**: rate limit exceeded
- Use **exponential backoff** with jitter for retries on 429/5xx.
- Cache results for repeated lookups (especially in batch annotation workflows).

### Output/data formats

Depending on endpoint/script options, gene data may be returned as:
- **JSON** (recommended for pipelines)
- **XML** (legacy/verbose metadata)
- **Text summaries**
- Sequence-oriented formats such as **FASTA** or **GenBank** (when supported by the chosen endpoint/workflow)

### Additional references

If present in the repository, consult:
- `references/api_reference.md` for endpoint/parameter details and response structures
- `references/common_workflows.md` for additional query patterns and end-to-end examples

Related Skills

research-proposal-generator

from aipoch/medical-research-skills

Generates a comprehensive research proposal design based on input literature, including hypothesis, mechanism verification, and budget. Use when the user wants to design a research project from a paper.

hypothesis-generation

from aipoch/medical-research-skills

Structured scientific hypothesis formulation from observations; use when you have experimental observations or preliminary data and need testable hypotheses with predictions, mechanisms, and validation experiments.

uspto-database

from aipoch/medical-research-skills

Access USPTO data (Patent Search, PEDS, TSDR, assignments) when you need to query patents/trademarks and retrieve prosecution or status information programmatically.

short-video-script-generator

from aipoch/medical-research-skills

Generate popular science short video scripts based on topic, duration, and style. Invoke when the user needs to create scripts for short science videos.

plan-generator

from aipoch/medical-research-skills

Automatically generates a Markdown final-exam review plan or lab experiment schedule when you provide a date range, tasks/items, and available daily hours (via interactive prompts or a one-time JSON input).

paper-tweet-generator

from aipoch/medical-research-skills

Generates a structured reading tweet from an academic paper (PDF, Word, or Text), highlighting specific product advantages. Use when the user wants to turn a document into a social media post or reading summary.

meeting-minutes-generator

from aipoch/medical-research-skills

Generates structured meeting minutes from text transcripts. Use when the user provides text content and wants a structured summary with a signature.

medical-case-report-generator

from aipoch/medical-research-skills

Generates a patient-friendly medical case report tweet from case images and disease name. Use when the user provides a medical case image and wants a structured report or tweet.

market-research-report-generator

from aipoch/medical-research-skills

Generates professional market research reports by analyzing business intent, decision levels, and conducting multi-source data retrieval (Web, PubMed, Clinical Trials).

expert-interview-generator

from aipoch/medical-research-skills

Generates a full expert interview article including introduction, Q&A body, and summary based on interview questions and expert background. Use when you have interview questions and an expert profile and need a polished article.

conference-tweet-generator

from aipoch/medical-research-skills

Generates academic conference tweets and summaries by filtering abstracts, translating content, and creating engaging titles. Use when you need to process conference abstracts into social media content.

academic-poster-generator

from aipoch/medical-research-skills

Complete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.