reference-finder

Automatically finds and ranks PubMed references for each sentence in scientific text; use when you need titles, DOIs, and brief recommendation reasons from the PubMed E-utilities API.

53 stars

byaipoch

View on GitHub Installation ↓

Best use case

reference-finder is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Automatically finds and ranks PubMed references for each sentence in scientific text; use when you need titles, DOIs, and brief recommendation reasons from the PubMed E-utilities API.

Teams using reference-finder should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/reference-finder/SKILL.md --create-dirs "https://raw.githubusercontent.com/aipoch/medical-research-skills/main/scientific-skills/Evidence Insight/reference-finder/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/reference-finder/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How reference-finder Compares

Feature / Agent	reference-finder	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Automatically finds and ranks PubMed references for each sentence in scientific text; use when you need titles, DOIs, and brief recommendation reasons from the PubMed E-utilities API.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You have a scientific paragraph and want suggested PubMed papers for **each sentence**.
- You need **top-ranked references** with **title, DOI, PMID, year**, and a short **why recommended** explanation.
- You are drafting or reviewing a manuscript and want quick **literature grounding** for key claims.
- You want a lightweight reference matcher that uses **only the official PubMed E-utilities API** (no third-party services).
- You need a scriptable tool for batch or CLI workflows to generate candidate citations.

## Key Features

- Sentence-level reference matching for scientific text.
- Returns the **top N (default: 3)** most relevant PubMed records per sentence.
- Outputs structured fields: **title, DOI, PMID, year, recommendation reason**.
- Relevance ranking based on:
  - keyword overlap / match strength,
  - publication year preference,
  - citation-count signal (when available/derivable).
- Safety constraints:
  - Network access restricted to `eutils.ncbi.nlm.nih.gov`.
  - No local filesystem writes except to `outputs/` during execution.
  - Request timeout set to **30 seconds** with clear error messages.
- Supports Python API usage and CLI usage (including interactive mode).

## Dependencies

- Python **3.x** (standard library only; no third-party packages required)

## Example Usage

### Python (direct call)

```python
from reference_finder import find_references

text = "CRISPR-Cas9 gene editing has revolutionized biomedical research."

results = find_references(text)

for ref in results[:3]:
    print(f"- {ref['title']} ({ref['year']})")
    print(f"  DOI: {ref['doi']}")
    print(f"  PMID: {ref['pmid']}")
    print(f"  Reason: {ref['reason']}")
```

### CLI (single input)

```bash
python scripts/find_refs.py "CRISPR-Cas9 gene editing has revolutionized biomedical research."
```

### CLI (interactive mode)

```bash
python scripts/find_refs.py
```

### Example output (JSON)

```json
[
  {
    "pmid": "PMID:",
    "title": "A Programmable Dual-RNA-Guided DNA Endonuclease in Vitro",
    "doi": "10.1126/science.1225829",
    "year": 2012,
    "reason": "Highest keyword match for 'CRISPR-Cas9', foundational paper"
  }
]
```

## Implementation Details

### Data flow

1. **Sentence splitting**: The input text is split into sentences (implementation-defined; typically punctuation-based).
2. **PubMed search (ESearch)**: For each sentence, a query is sent to:
   - `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi`
3. **Record retrieval (EFetch)**: The top candidate PMIDs are fetched via:
   - `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi`
4. **Field extraction**: Title, year, PMID, and DOI (when present) are extracted from the returned metadata.
5. **Ranking and selection**: Candidates are scored and the top **N** are returned with a short recommendation reason.

### Ranking signals

- **Keyword match**: Measures overlap between sentence terms and retrieved record metadata (e.g., title/abstract terms when available).
- **Publication year**: Used as a preference signal (e.g., favoring more recent work unless a classic/foundational match is strong).
- **Citation count**: Incorporated when available/derivable; otherwise treated as missing without failing the run.

### Operational constraints and safety

- **Allowed network host**: `eutils.ncbi.nlm.nih.gov` only.
- **Prohibited**: Any third-party URLs.
- **Filesystem**: Do not write outside `outputs/` during execution.
- **Rate limiting**: Use a reasonable request cadence (e.g., **~0.5s** between requests) to respect API limits.
- **Timeout**: **30 seconds** per request.
- **Error handling**: Return semantic, user-readable error messages for network/API/parse failures.

### Defaults

- **Top references per sentence**: 3
- **Endpoints**:
  - ESearch: `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi`
  - EFetch: `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi`

### Related project files

- Main script: `scripts/find_refs.py`
- Tests: `tests/test_finder.py`
- Evaluation checklist: `references/evaluation-checklist.md`
- PubMed E-utilities documentation: https://www.ncbi.nlm.nih.gov/books/NBK25504/

Related Skills

reference-search

from aipoch/medical-research-skills

Multi-database literature search and search-strategy design that outputs structured, reproducible result lists; use when you need reference retrieval, systematic searching, review topic selection, or to construct a traceable search strategy.

cross-disciplinary-bridge-finder

from aipoch/medical-research-skills

Use when identifying collaboration opportunities across fields, finding experts in complementary disciplines, translating methodologies between scientific domains, or building interdisciplinary research teams. Identifies synergies between scientific disciplines, matches researchers with complementary expertise, and facilitates cross-domain collaborations. Supports interdisciplinary grant applications and innovative research team formation.

reference-style-sync

from aipoch/medical-research-skills

One-click synchronization and standardization of reference formats in literature management tools, intelligently fixing metadata errors.

figure-reference-checker

from aipoch/medical-research-skills

Use figure reference checker for academic writing workflows that need structured execution, explicit assumptions, and clear output boundaries.

two-sample-mr-exposure-screening-reference-grounded

from aipoch/medical-research-skills

Generates complete two-sample Mendelian randomization research designs from a user-provided outcome, exposure or exposure family, and robustness direction. Use when a study centers on summary-statistics causal inference with instrument selection, harmonization, IVW-primary estimation, complementary estimators, sensitivity analyses, optional multivariable upgrades, and conservative evidence interpretation. Covers five study patterns and always outputs Lite / Standard / Advanced / Publication+ with a recommended primary plan, stepwise workflow, figure plan, validation hierarchy, minimal executable version, publication upgrade path, and strictly verified literature retrieval.

single-gene-oncology-reference-grounded

from aipoch/medical-research-skills

Generates complete conventional single-gene oncology research designs from a user-provided cancer context, target gene, and validation direction. Use when a study centers on a fixed candidate gene and needs expression, prognosis, clinicopathologic association, functional interpretation, immune context, genomic or epigenetic context, optional drug-response hypotheses, and orthogonal validation. Covers five study patterns and always outputs Lite / Standard / Advanced / Publication+ with a recommended primary plan, stepwise workflow, figure plan, validation hierarchy, minimal executable version, publication upgrade path, and strictly verified literature retrieval.

single-compound-network-toxicology-disease-link-reference-grounded

from aipoch/medical-research-skills

Generates complete single-compound network-toxicology research designs from one exposure, one disease or toxic phenotype, and a validation direction. Use when a study centers on one compound–one disease link and needs target collection, overlap construction, enrichment, PPI hub prioritization, docking, optional transcriptomic cross-check, and conservative mechanistic synthesis. Covers five study patterns and always outputs Lite / Standard / Advanced / Publication+ with a recommended primary plan, stepwise workflow, figure plan, validation hierarchy, minimal executable version, publication upgrade path, and strictly verified literature retrieval.

comparative-network-toxicology-shared-mechanism-reference-grounded

from aipoch/medical-research-skills

Generates complete comparative network-toxicology research designs from a user-provided exposure pair, shared toxic phenotype, and validation direction. Use when a study centers on two related exposures under one outcome and needs target collection, shared-vs-specific target decomposition, enrichment, PPI hub prioritization, docking, optional transcriptomic cross-checks, and conservative mechanistic synthesis. Covers five study patterns and always outputs Lite / Standard / Advanced / Publication+ with a recommended primary plan, stepwise workflow, figure plan, validation hierarchy, minimal executable version, publication upgrade path, and strictly verified literature retrieval.

preprint-surveillance-finder

from aipoch/medical-research-skills

Tracks the latest preprints and emerging research topics related to your topic across bioRxiv, medRxiv, and arXiv. Use when a user wants to discover what is being published right now before it reaches journals, monitor competitor directions, spot new methodology trends, or get an early-warning scan of a research area. Triggers on phrases like "what's new in X", "latest preprints on Y", "emerging topics in Z", "monitor bioRxiv for", or "what are people working on in this field".

medical-research-gap-finder

from aipoch/medical-research-skills

Identifies real, evidence-audited, topic-specific research gaps in medical research by first retrieving and verifying literature from trusted sources, then mapping the current evidence landscape, rejecting pseudo-gaps, and converting only medium/high-confidence gaps into study-ready research opportunities. Always require real literature retrieval before formal gap claims. Never fabricate references, metadata, or findings.

bioinformatics-translational-opportunity-finder

from aipoch/medical-research-skills

Identifies translationally meaningful paths for bioinformatics findings by mapping omics or computational discoveries to diagnosis, stratification, prognosis, treatment-response, monitoring, or target-nomination use cases, while auditing bridge evidence, assayability, and validation burden. Use this skill when a user wants to know whether a bioinformatics finding can be framed as a stronger translational topic without overclaiming clinical relevance. Always separate statistical signal from translational value, and never imply clinical utility, targetability, or validation depth without explicit evidence support.

basic-discovery-translational-opportunity-finder

from aipoch/medical-research-skills

Finds translational opportunities that connect basic-research discoveries to clinically meaningful use cases such as diagnosis, stratification, prognosis, treatment response prediction, monitoring, or therapeutic development. Use this skill when a user wants to turn a mechanism finding, pathway signal, cellular phenotype, experimental observation, or omics discovery into a stronger translational research direction. Always separate mechanistic relevance from translational usability, and never present a basic finding as clinically actionable unless the evidence supports that level.