scienceclaw-retrieval

Retrieve scientific information from databases, literature, and knowledge bases. Use when: (1) finding relevant papers, (2) querying scientific databases, (3) cross-referencing findings, (4) building bibliographies, (5) systematic literature search. NOT for: answering questions (use scienceclaw-qa), summarizing (use scienceclaw-summarization), or data analysis (use code-execution skill).

564 stars

bybeita6969

View on GitHub Installation ↓

Best use case

scienceclaw-retrieval is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using scienceclaw-retrieval should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/scienceclaw-retrieval/SKILL.md --create-dirs "https://raw.githubusercontent.com/beita6969/ScienceClaw/main/skills/scienceclaw-retrieval/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/scienceclaw-retrieval/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How scienceclaw-retrieval Compares

Feature / Agent	scienceclaw-retrieval	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# scienceclaw-retrieval

Retrieve scientific information from databases, literature repositories, and knowledge bases using structured search strategies, relevance ranking, and citation chaining.

## When to Use

- Finding relevant papers on a specific research topic or question
- Querying scientific databases (PubMed, arXiv, Semantic Scholar, CrossRef, OpenAlex)
- Cross-referencing findings across multiple sources and databases
- Building comprehensive bibliographies for a research project or review
- Conducting systematic literature searches with reproducible methodology
- Tracking citation networks to discover related or derivative work
- Locating datasets, code repositories, or supplementary materials linked to publications

## When NOT to Use

- Answering specific scientific questions -- use `scienceclaw-qa`
- Summarizing papers or synthesizing findings -- use `scienceclaw-summarization`
- Running data analysis or computations on retrieved data -- use code-execution skill
- Extracting structured information from paper text -- use `scienceclaw-ie`
- Verifying claims or checking calculations -- use `scienceclaw-verification`

## Multi-Database Search Strategies

**Parallel Search**: For broad discovery, query PubMed, arXiv, Semantic Scholar, and OpenAlex simultaneously, collect results with DOIs and metadata, deduplicate by DOI, apply relevance ranking, then filter by date/type/discipline.

**Sequential Refinement**: For targeted retrieval, start with a broad query to gauge the landscape, analyze initial results for recurring keywords and author clusters, refine with Boolean operators and filters, snowball via citation chaining on top hits, and stop at saturation (new queries return mostly known results).

**Systematic Review Search**: Define PICO/PEO framework, construct Boolean queries with synonyms and controlled vocabulary, document every query/database/date/count for reproducibility, include grey literature (preprints, proceedings, registries), screen via title/abstract then full-text phases, and track numbers through a PRISMA flow diagram.

## Database-Specific Query Syntax

- **PubMed**: MeSH terms via `[MeSH Terms]`, field tags `[tiab]`/`[au]`/`[dp]`, capitalized Boolean operators, `[pt]` for publication type. Example: `"machine learning"[tiab] AND "drug discovery"[tiab] AND "2023"[dp]`
- **arXiv**: Field prefixes `ti:`/`au:`/`abs:`/`cat:`, Boolean AND/OR/ANDNOT, category codes (cs.AI, q-bio.BM), trailing wildcards. Example: `ti:"neural network" AND cat:cs.LG AND au:bengio`
- **Semantic Scholar**: API parameters `query`/`year`/`fieldsOfStudy`/`venue`, field filtering, pagination via `offset`/`limit`, direct lookup by DOI or arXiv ID
- **CrossRef**: `/works?query=` endpoint, filters like `from-pub-date:2023,type:journal-article`, field queries `query.title=`/`query.author=`, sort by `relevance`/`published`/`is-referenced-by-count`
- **OpenAlex**: Entity endpoints `/works`/`/authors`/`/sources`, filters with commas (AND) or pipe (OR), `search=` for full-text, `group_by=` for aggregation, open access filtering via `open_access.is_oa:true`

## Relevance Ranking

Combine multiple scoring signals: textual similarity between query and title/abstract (primary), citation count with recency weighting, publication date, venue quality (impact factor or acceptance rate), author authority (h-index in subfield), and reference overlap with known relevant papers. For active fields, apply time-decayed citation scoring: `adjusted_score = citation_count / (current_year - publication_year + 1)`. Support user-guided re-ranking by marking papers as highly relevant, somewhat relevant, or not relevant, then refine queries using terms from top-marked papers.

## Citation Chaining

- **Forward chaining (cited-by)**: From a seed paper, find all papers that cite it via Semantic Scholar or OpenAlex, filter by date/venue/topic, repeat for new relevant hits (limit depth to 2-3 hops)
- **Backward chaining (references)**: Extract the seed paper's reference list, score references by co-occurrence frequency across your relevant set, identify foundational works
- **Co-citation analysis**: Gather citation neighborhoods of 3-5 seed papers, find papers appearing in multiple neighborhoods as conceptually related candidates
- **Bibliographic coupling**: Find papers sharing high reference overlap with seed papers, indicating they address similar research questions

## Deduplication

- **DOI-based**: Primary key for deduplication; prefer the record with richest metadata when merging
- **Fuzzy title matching**: For records without DOIs, normalize titles (lowercase, strip punctuation/articles), apply Jaccard > 0.85 or edit distance ratio > 0.90, verify by checking author overlap and publication year
- **Preprint-publication linking**: Match arXiv preprints to journal versions via DOI metadata or title matching, prefer published version but retain preprint if it has additional content (appendices, code), flag substantial differences between versions

## Integration with Specialized Skills

- **PubMed**: biomedical and life sciences, MeSH controlled vocabulary, structured abstracts, clinical trial metadata
- **arXiv**: physics, math, CS, quantitative biology preprints, open-access full-text PDFs, new submission monitoring
- **Semantic Scholar**: cross-disciplinary search, citation graph features, TLDR summaries, influential citation filtering
- **CrossRef**: DOI resolution, comprehensive metadata, funding and license data, reference lists, citation ambiguity resolution
- **OpenAlex**: large-scale bibliometrics, trend discovery, open-access links via Unpaywall, concept tagging for topic filtering

## Output Format

### Single Query Result
```
Query: [Search query text]
Database(s): [Databases searched]
Total Results: [Count]
After Deduplication: [Count]

Top Results:
  1. [Title] | [Authors] | [Year] | [Venue]
     DOI: [DOI] | Citations: [Count]
     Relevance: [Score] | Abstract: [First 200 chars...]
```

### Systematic Search Report
```
Search Strategy Report
======================
Research Question: [PICO-formatted question]
Date Executed: [Date]

Database Searches:
  - PubMed: [Query] -> [N results]
  - arXiv: [Query] -> [N results]
  - Semantic Scholar: [Query] -> [N results]

Total Retrieved: [N]
After Deduplication: [N]
After Title/Abstract Screening: [N]
Final Included: [N]

Included Papers:
  [Numbered list with full bibliographic details]
```

## Zero-Hallucination Rule

ALL factual claims, citations, database results, and scientific data presented to the user MUST come from actual tool results (API calls, code execution, web search) in this conversation. NEVER fabricate or "fill in" details from training data. If a tool returns no results or partial data, report exactly what happened.

Related Skills

scientific-retrieval

564

from beita6969/ScienceClaw

Retrieve and recommend relevant documents from financial, historical, and scientific archives

scienceclaw-verification

564

from beita6969/ScienceClaw

Verify scientific claims, check calculations, validate experimental designs, and fact-check citations. Use when: (1) checking a claim against evidence, (2) validating statistical analyses, (3) verifying experimental reproducibility claims, (4) fact-checking references, (5) adversarial review of research. NOT for: generating new content (use scienceclaw-generation), simple QA (use scienceclaw-qa).

scienceclaw-summarization

564

from beita6969/ScienceClaw

Summarize scientific papers, datasets, experimental results, and literature reviews. Use when: (1) condensing research papers, (2) creating literature reviews, (3) summarizing experimental findings, (4) meta-analysis synthesis, (5) creating executive summaries of research. NOT for: information extraction (use scienceclaw-ie), full paper retrieval (use scienceclaw-retrieval), or writing new content (use scienceclaw-generation).

scienceclaw-reasoning

564

from beita6969/ScienceClaw

Perform multi-step scientific reasoning, proof construction, causal inference, and logical argumentation. Use when: (1) deriving conclusions from premises, (2) causal analysis, (3) mathematical proofs, (4) hypothesis evaluation, (5) counterfactual reasoning. NOT for: simple factual questions (use scienceclaw-qa), data analysis (use code-execution), or literature search (use scienceclaw-retrieval).

scienceclaw-qa

564

from beita6969/ScienceClaw

Answer scientific questions across all disciplines with evidence-based responses and citations. Use when: (1) user asks factual science questions, (2) needs explanation of concepts/theories/methods, (3) multi-step scientific reasoning needed. Covers natural sciences (physics, chemistry, biology, medicine, materials, astronomy, earth science, math, CS) and social sciences (economics, sociology, psychology, political science, linguistics, history, law, philosophy, education). NOT for: opinion-based questions, non-scientific queries, or when code execution is needed (use code-execution skill).

scienceclaw-prediction

564

from beita6969/ScienceClaw

Predict scientific properties, trends, and outcomes. Use when: user asks for property prediction, trend forecasting, or model-based estimation. NOT for: historical data lookup or real-time monitoring.

scienceclaw-ie

564

from beita6969/ScienceClaw

Extract structured information from scientific texts: entities, relations, data tables, methods, results. Use when: (1) parsing papers for key data, (2) extracting experimental parameters, (3) building knowledge graphs from literature, (4) NER on scientific documents, (5) extracting methods/results sections. NOT for: summarization (use scienceclaw-summarization), full text retrieval (use scienceclaw-retrieval).

scienceclaw-generation

564

from beita6969/ScienceClaw

Generate scientific hypotheses, experimental designs, and paper drafts. Use when: user asks to propose hypotheses, design experiments, or write scientific content. NOT for: data analysis or literature search.

scienceclaw-discovery

564

from beita6969/ScienceClaw

Identify research gaps, synthesize cross-disciplinary insights, and generate novel hypotheses. Use when: user asks about unexplored areas, cross-field connections, or new research directions. NOT for: routine literature review or data analysis.

scienceclaw-classification

564

from beita6969/ScienceClaw

Classify scientific content by discipline, methodology, topic, and quality. Use when: user asks to categorize papers, methods, or research outputs. NOT for: simple keyword tagging or non-scientific content.

xurl

564

from beita6969/ScienceClaw

A CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.

xlsx

564

from beita6969/ScienceClaw

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.