gene-database
Query the NCBI Gene database via E-utilities and the NCBI Datasets API; use it when you need to search genes by symbol/ID and retrieve annotations (RefSeq, GO, location, phenotype) for single or batch gene lists.
Best use case
gene-database is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Query the NCBI Gene database via E-utilities and the NCBI Datasets API; use it when you need to search genes by symbol/ID and retrieve annotations (RefSeq, GO, location, phenotype) for single or batch gene lists.
Teams using gene-database should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/gene-database/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How gene-database Compares
| Feature / Agent | gene-database | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Query the NCBI Gene database via E-utilities and the NCBI Datasets API; use it when you need to search genes by symbol/ID and retrieve annotations (RefSeq, GO, location, phenotype) for single or batch gene lists.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills) ## When to Use - You have a gene symbol (e.g., **BRCA1**) and need the correct **NCBI Gene ID** for a specific organism. - You have an NCBI **Gene ID** and need consolidated metadata (aliases, RefSeq accessions, genomic location, GO, literature links). - You need to **annotate a gene panel** (dozens to thousands of genes) with consistent identifiers and core annotations. - You want to search genes by **biological context** (GO terms, phenotype/disease keywords, pathway terms) and then retrieve details for the hits. - You are building a pipeline that must respect **NCBI rate limits** and handle retries for transient API failures. ## Key Features - Symbol/name search with organism scoping using **E-utilities (ESearch)**. - Gene record retrieval by ID using **E-utilities (EFetch/ESummary)** in JSON/XML/text-oriented outputs. - Streamlined, gene-focused retrieval using the **NCBI Datasets API** (metadata + sequences/links in a single workflow). - Batch lookup utilities with basic rate-limit awareness and output aggregation. - Supports common annotation fields: nomenclature/aliases, RefSeq transcripts/proteins, genomic location, GO annotations, phenotype/disease keywords, and related literature references. ## Dependencies - Python **3.9+** - `requests` **>= 2.28** - NCBI E-utilities (Entrez) HTTP API (public service) - NCBI Datasets HTTP API (public service) - Optional: NCBI API key (recommended for higher throughput) ## Example Usage > The following examples assume the repository provides these scripts: > - `scripts/query_gene.py` > - `scripts/fetch_gene_data.py` > - `scripts/batch_gene_lookup.py` ### 1) Search by symbol/name (E-utilities / ESearch) ```bash python scripts/query_gene.py --search "BRCA1" --organism "human" ``` Example advanced query strings: ```bash python scripts/query_gene.py --search "insulin[gene name] AND human[organism]" python scripts/query_gene.py --search "dystrophin[gene name] AND muscular dystrophy[disease]" python scripts/query_gene.py --search "human[organism] AND 17q21[chromosome]" ``` ### 2) Retrieve gene information by Gene ID Using E-utilities (format-oriented retrieval): ```bash python scripts/query_gene.py --id 672 --format json ``` Using NCBI Datasets API (consolidated gene payload): ```bash python scripts/fetch_gene_data.py --gene-id 672 ``` Or by symbol + taxon: ```bash python scripts/fetch_gene_data.py --symbol BRCA1 --taxon human python scripts/fetch_gene_data.py --symbol TP53 --taxon "Homo sapiens" --output json ``` ### 3) Batch lookup for gene list annotation From a file of symbols (organism required for symbol disambiguation): ```bash python scripts/batch_gene_lookup.py --file gene_list.txt --organism human ``` From a comma-separated list of Gene IDs: ```bash python scripts/batch_gene_lookup.py --ids 672,7157,5594 --output results.json ``` ## Implementation Details ### API selection guidance - Use **E-utilities** when you need: - complex Entrez query syntax (fielded queries, boolean logic), - cross-database patterns, - fine control over search and retrieval steps (ESearch → ESummary/EFetch). - Use **NCBI Datasets API** when you need: - a streamlined gene-centric retrieval path, - consolidated metadata (and often sequence-related links) with fewer round trips. ### Query patterns (E-utilities) Typical fielded query components include: - `"<SYMBOL>"` plus organism scoping: `BRCA1[gene name] AND human[organism]` - GO term searches (example): `GO:0006915[biological process]` - Phenotype/disease keywords (example): `diabetes[phenotype] AND mouse[organism]` - Pathway keywords (example): `insulin signaling pathway[pathway]` ### Rate limits and API keys - Without an API key (typical defaults): - E-utilities: ~**3 requests/sec** - Datasets API: ~**5 requests/sec** - With an NCBI API key: - both can be used up to ~**10 requests/sec** (service-dependent) Obtain an API key from: https://www.ncbi.nlm.nih.gov/account/ ### Error handling recommendations - Handle standard HTTP errors: - **400**: invalid/malformed query or parameters - **404**: Gene ID not found - **429**: rate limit exceeded - Use **exponential backoff** with jitter for retries on 429/5xx. - Cache results for repeated lookups (especially in batch annotation workflows). ### Output/data formats Depending on endpoint/script options, gene data may be returned as: - **JSON** (recommended for pipelines) - **XML** (legacy/verbose metadata) - **Text summaries** - Sequence-oriented formats such as **FASTA** or **GenBank** (when supported by the chosen endpoint/workflow) ### Additional references If present in the repository, consult: - `references/api_reference.md` for endpoint/parameter details and response structures - `references/common_workflows.md` for additional query patterns and end-to-end examples
Related Skills
research-proposal-generator
Generates a comprehensive research proposal design based on input literature, including hypothesis, mechanism verification, and budget. Use when the user wants to design a research project from a paper.
hypothesis-generation
Structured scientific hypothesis formulation from observations; use when you have experimental observations or preliminary data and need testable hypotheses with predictions, mechanisms, and validation experiments.
uspto-database
Access USPTO data (Patent Search, PEDS, TSDR, assignments) when you need to query patents/trademarks and retrieve prosecution or status information programmatically.
short-video-script-generator
Generate popular science short video scripts based on topic, duration, and style. Invoke when the user needs to create scripts for short science videos.
plan-generator
Automatically generates a Markdown final-exam review plan or lab experiment schedule when you provide a date range, tasks/items, and available daily hours (via interactive prompts or a one-time JSON input).
paper-tweet-generator
Generates a structured reading tweet from an academic paper (PDF, Word, or Text), highlighting specific product advantages. Use when the user wants to turn a document into a social media post or reading summary.
meeting-minutes-generator
Generates structured meeting minutes from text transcripts. Use when the user provides text content and wants a structured summary with a signature.
medical-case-report-generator
Generates a patient-friendly medical case report tweet from case images and disease name. Use when the user provides a medical case image and wants a structured report or tweet.
market-research-report-generator
Generates professional market research reports by analyzing business intent, decision levels, and conducting multi-source data retrieval (Web, PubMed, Clinical Trials).
expert-interview-generator
Generates a full expert interview article including introduction, Q&A body, and summary based on interview questions and expert background. Use when you have interview questions and an expert profile and need a polished article.
conference-tweet-generator
Generates academic conference tweets and summaries by filtering abstracts, translating content, and creating engaging titles. Use when you need to process conference abstracts into social media content.
academic-poster-generator
Complete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.