tooluniverse-literature-deep-research

Conduct comprehensive literature research with target disambiguation, evidence grading, and structured theme extraction. Creates a detailed report with mandatory completeness checklist, biological model synthesis, and testable hypotheses. For biological targets, resolves official IDs (Ensembl/UniProt), synonyms, naming collisions, and gathers expression/pathway context before literature search. Default deliverable is a report file; for single factoid questions, uses a fast verification mode and may include an inline answer. Use when users need thorough literature reviews, target profiles, or to verify specific claims from the literature.

42 stars

byZaoqu-Liu

View on GitHub Installation ↓

Best use case

tooluniverse-literature-deep-research is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using tooluniverse-literature-deep-research should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/tooluniverse-literature-deep-research/SKILL.md --create-dirs "https://raw.githubusercontent.com/Zaoqu-Liu/ScienceClaw/main/skills/tooluniverse-literature-deep-research/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/tooluniverse-literature-deep-research/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How tooluniverse-literature-deep-research Compares

Feature / Agent	tooluniverse-literature-deep-research	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Literature Deep Research Strategy (Enhanced)

A systematic approach to comprehensive literature research that **starts with target disambiguation** to prevent missing details, uses **evidence grading** to separate signal from noise, and produces a **content-focused report** with mandatory completeness sections.

**KEY PRINCIPLES**:
1. **Target disambiguation FIRST** - Resolve IDs, synonyms, naming collisions before literature search
2. **Right-size the deliverable** - Use *Factoid / Verification Mode* for single, answerable questions; use full report mode for “deep research”
3. **Report-first output** - Default deliverable is a report file; an inline answer is allowed (and recommended) for Factoid / Verification Mode
4. **Evidence grading** - Grade every claim by evidence strength (mechanistic paper vs screen hit vs review vs text-mined)
5. **Mandatory completeness** - All checklist sections must exist, even if "unknown/limited evidence"
6. **Source attribution** - Every piece of information traceable to database/tool
7. **English-first queries** - Always use English terms for literature searches and tool calls, even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language

---

## Workflow Overview

```
User Query
  ↓
Phase 0: CLARIFY + MODE SELECT (factoid vs deep report)
  ↓
Phase 1: TARGET DISAMBIGUATION + PROFILE (default ON for biological targets)
  ├─ Resolve official IDs (Ensembl, UniProt, HGNC)
  ├─ Gather synonyms/aliases + known naming collisions
  ├─ Get protein length, isoforms, domain architecture
  ├─ Get subcellular location, expression, GO terms, pathways
  └─ Output: Target Profile section + Collision-aware search plan
  ↓
Phase 2: LITERATURE SEARCH (internal methodology, not shown)
  ├─ High-precision seed queries (build mechanistic core)
  ├─ Citation network expansion from seeds
  ├─ Collision-filtered broader queries
  └─ Theme clustering + evidence grading
  ↓
Phase 3: REPORT SYNTHESIS
  ├─ Progressive writing to [topic]_report.md
  ├─ Mandatory completeness checklist validation
  └─ Biological model + testable hypotheses
  ↓
Optional: methods_appendix.md (only if user requests)
```

---

## Phase 0: Initial Clarification

### Mandatory Questions

1. **Target type**: Is this a biological target (gene/protein), a general topic, or a disease?
2. **Scope**: Is this a *single factoid to verify* (“Which antibiotic?”, “Which strain?”, “Which year?”) or a comprehensive/deep review?
3. **Known aliases**: Any specific gene symbols or protein names you use?
4. **Constraints**: Open access only? Include preprints? Specific organisms?
5. **Methods appendix**: Do you want methodology details in a separate file?

### Mode Selection (CRITICAL)

Pick exactly one mode based on the user’s intent and the question structure:

1. **Factoid / Verification Mode** (single concrete question; answer should be a short phrase/sentence)
2. **Mini-review Mode** (narrow topic; 1–3 pages of synthesis)
3. **Full Deep-Research Mode** (use the full template + completeness checklist)

**Heuristic**:
- If the user asks “X has been evolved to be resistant to *which antibiotic*?” → **Factoid / Verification Mode**
- If the user asks “What does the literature say about X?” → **Full Deep-Research Mode**

### Factoid / Verification Mode (Fast Path)

**Goal**: Provide a correct, source-verified single answer, with minimal but explicit evidence attribution.

**Deliverables** (still file-backed):
1. `[topic]_factcheck_report.md` (≤ 1 page)
2. `[topic]_bibliography.json` (+ CSV) containing the key paper(s)

**Fact-check report template**:
```markdown
# [TOPIC]: Fact-check Report

*Generated: [Date]*
*Evidence cutoff: [Date]*

## Question
[User question]

## Answer
**[One-sentence answer]** [Evidence: ★★★/★★☆/★☆☆/☆☆☆]

## Source(s)
- [Primary paper citation: journal/year/PMID/DOI as available]

## Verification Notes
- [1–3 bullets: where in the paper the statement appears (Abstract/Results/Methods), and any key constraints]

## Limitations
- [If full text not available, or if only review evidence exists]
```

**Required verification behavior**:
- Prefer ToolUniverse literature tools (Europe PMC / PubMed / PMC / Semantic Scholar) over general web browsing.
- Use full-text snippet verification when possible (Europe PMC auto-snippet tier is ideal).
- Avoid adding extra claims (e.g., “not X”) unless the paper explicitly supports them.

**Suggested tool pattern**:
- `EuropePMC_search_articles(query=..., extract_terms_from_fulltext=[...])` to pull OA full-text snippets for the key terms.
- If OA snippets unavailable: fall back to `PMC_search_papers` (if in PMC) or `SemanticScholar_search_papers` → `SemanticScholar_get_pdf_snippets`.

**Evidence grading (factoid)**:
- If the statement is explicitly made in a primary experimental paper (Results/Methods/Abstract): label **T1 (★★★)**.
- If it’s only in a review: label **T4 (☆☆☆)** and try to locate the primary source.

### Detect Target Type

| Query Pattern | Type | Action |
|---------------|------|--------|
| Gene symbol (EGFR, TP53, ATP6V1A) | Biological target | Phase 1 required |
| Protein name ("V-ATPase", "kinase") | Biological target | Phase 1 required |
| UniProt ID (P00533, Q93050) | Biological target | Phase 1 required |
| Disease, pathway, method | General topic | Phase 1 optional |
| "Literature on X" | Depends on X | Assess X |

---

## Phase 1: Target Disambiguation + Profile (Default ON)

**CRITICAL**: This phase prevents "missing target details" when literature is sparse or noisy.

### 1.1 Resolve Official Identifiers

Use these tools to establish canonical identity:

```
UniProt_search → Get UniProt accession for human protein
UniProt_get_entry_by_accession → Full entry with cross-references
UniProt_id_mapping → Map between ID types
ensembl_lookup_gene → Ensembl gene ID, biotype
MyGene_get_gene_annotation → NCBI Gene ID, aliases, summary
```

**Output for report**:
```markdown
## Target Identity

| Identifier | Value | Source |
|------------|-------|--------|
| Official Symbol | ATP6V1A | HGNC |
| UniProt | P38606 | UniProt |
| Ensembl Gene | ENSG00000114573 | Ensembl |
| NCBI Gene ID | 523 | NCBI |
| ChEMBL Target | CHEMBL2364682 | ChEMBL |

**Full Name**: V-type proton ATPase catalytic subunit A
**Synonyms/Aliases**: ATP6A1, VPP2, Vma1, VA68
```

### 1.2 Identify Naming Collisions

**CRITICAL**: Many gene names have collisions. Examples:
- **TRAG**: T-cell regulatory gene vs bacterial TraG conjugation protein
- **WDR7-7**: Could match gene WDR7 vs lncRNA
- **JAK**: Janus kinase vs Just Another Kinase
- **CAT**: Catalase vs chloramphenicol acetyltransferase

**Detection strategy**:
1. Search PubMed for `"[SYMBOL]"[Title]` - review first 20 titles
2. If >20% off-topic, identify collision terms
3. Build negative filter: `NOT [collision_term1] NOT [collision_term2]`

**Output for report**:
```markdown
### Known Naming Collisions

- Symbol "ATP6V1A" is unambiguous (no major collisions detected)
- Related but distinct: ATP6V0A1-4 (V0 subunits vs V1 subunits)
- Search filter applied: Include "vacuolar" OR "V-ATPase", exclude "V0 domain" when V1-specific
```

### 1.3 Protein Architecture & Domains

Use annotation tools (not literature):

```
InterPro_get_protein_domains → Domain architecture
UniProt_get_ptm_processing_by_accession → PTMs, active sites
proteins_api_get_protein → Additional protein features
```

**Output for report**:
```markdown
### Protein Architecture

| Domain | Position | InterPro ID | Function |
|--------|----------|-------------|----------|
| V-ATPase A subunit, N-terminal | 1-90 | IPR022879 | ATP binding |
| V-ATPase A subunit, catalytic | 91-490 | IPR005725 | Catalysis |
| V-ATPase A subunit, C-terminal | 491-617 | IPR022878 | Complex assembly |

**Length**: 617 aa | **Isoforms**: 2 (canonical P38606-1, variant P38606-2 missing aa 1-45)
**Active sites**: Lys-168 (ATP binding), Glu-261 (catalytic)

*Sources: InterPro, UniProt*
```

### 1.4 Subcellular Location

```
HPA_get_subcellular_location → Human Protein Atlas localization
UniProt_get_subcellular_location_by_accession → UniProt annotation
```

**Output for report**:
```markdown
### Subcellular Localization

| Location | Confidence | Source |
|----------|------------|--------|
| Lysosome membrane | High | HPA + UniProt |
| Endosome membrane | High | UniProt |
| Golgi apparatus | Medium | HPA |
| Plasma membrane (subset) | Low | Literature |

**Primary location**: Lysosomal/endosomal membranes (vacuolar ATPase complex)
*Sources: Human Protein Atlas, UniProt*
```

### 1.5 Baseline Expression

```
GTEx_get_median_gene_expression → Tissue expression (TPM)
HPA_get_rna_expression_by_source → HPA expression data
```

**Output for report**:
```markdown
### Baseline Tissue Expression

| Tissue | Expression (TPM) | Specificity |
|--------|------------------|-------------|
| Kidney cortex | 145.3 | Elevated |
| Liver | 98.7 | Medium |
| Brain - Cerebellum | 87.2 | Medium |
| Lung | 76.4 | Medium |
| Ubiquitous baseline | ~50 | Broad |

**Tissue Specificity**: Low (τ = 0.28) - broadly expressed housekeeping gene
*Source: GTEx v8*
```

### 1.6 GO Terms & Pathway Placement

```
GO_get_annotations_for_gene → GO annotations
Reactome_map_uniprot_to_pathways → Reactome pathways
kegg_get_gene_info → KEGG pathways
OpenTargets_get_target_gene_ontology_by_ensemblID → Open Targets GO
```

**Output for report**:
```markdown
### Functional Annotations (GO)

**Molecular Function**:
- ATP hydrolysis activity (GO:0016887) [Evidence: IDA]
- Proton-transporting ATPase activity (GO:0046961) [Evidence: IDA]

**Biological Process**:
- Lysosomal acidification (GO:0007041) [Evidence: IMP]
- Autophagy (GO:0006914) [Evidence: IMP]
- Bone resorption (GO:0045453) [Evidence: IMP]

**Cellular Component**:
- Vacuolar proton-transporting V-type ATPase, V1 domain (GO:0000221) [Evidence: IDA]

### Pathway Involvement

| Pathway | Database | Significance |
|---------|----------|--------------|
| Lysosome | KEGG hsa04142 | Core component |
| Phagosome | KEGG hsa04145 | Acidification |
| Autophagy - animal | Reactome R-HSA-9612973 | mTORC1 regulation |

*Sources: GO Consortium, Reactome, KEGG*
```

---

## Phase 2: Literature Search (Internal Methodology)

**NOTE**: This methodology is kept internal. The report shows findings, not process.

### 2.1 Query Strategy: Collision-Aware Synonym Plan

#### Step 1: High-Precision Seed Queries (Build Mechanistic Core)

```
Query 1: "[GENE_SYMBOL]"[Title] AND (mechanism OR function OR structure)
Query 2: "[FULL_PROTEIN_NAME]"[Title] 
Query 3: "[UNIPROT_ID]" (catches supplementary materials)
```

**Purpose**: Get 15-30 high-confidence, mechanistic papers that are definitely on-target.

#### Step 2: Citation Network Expansion (Especially for Sparse Targets)

Once you have 5-15 core PMIDs:
```
PubMed_get_cited_by → Papers citing each seed
PubMed_get_related → Computationally related papers  
EuropePMC_get_citations → Alternative citation source
EuropePMC_get_references → Backward citations from seeds
```

**Citation-network first option**: For older targets with deprecated terminology, citation expansion often outperforms keyword searching.

#### Step 3: Collision-Filtered Broader Queries

```
Broader query: "[GENE_SYMBOL]" AND ([pathway1] OR [pathway2] OR [function])
Apply collision filter: NOT [collision_term1] NOT [collision_term2]
```

Example for bacterial TraG collision:
```
"TRAG" AND (T-cell OR immune OR cancer) NOT plasmid NOT conjugation NOT bacterial
```

### 2.2 Database Tools

**Literature Search** (use all relevant):
- `PubMed_search_articles` - Primary biomedical
- `PMC_search_papers` - Full-text
- `EuropePMC_search_articles` - European coverage
- `openalex_literature_search` - Broad academic
- `Crossref_search_works` - DOI registry
- `SemanticScholar_search_papers` - AI-ranked
- `BioRxiv_search_preprints` / `MedRxiv_search_preprints` - Preprints

**Citation Tools** (with failure handling):
- `PubMed_get_cited_by` - Primary (NCBI elink can be flaky)
- `EuropePMC_get_citations` - **Fallback** when PubMed fails
- `PubMed_get_related` - Related articles
- `EuropePMC_get_references` - Reference lists

**Annotation Tools** (not literature, but fill gaps):
- `UniProt_*` tools - Protein data
- `InterPro_get_protein_domains` - Domains
- `GTEx_*` tools - Expression
- `HPA_*` tools - Human Protein Atlas
- `OpenTargets_*` tools - Target-disease associations
- `GO_get_annotations_for_gene` - GO terms

### 2.3 Full-Text Verification Strategy

**WHEN TO USE**: Abstracts lack critical experimental details (exact drugs, cell lines, concentrations, specific protocols).

**Three-Tier Strategy**:

#### Tier 1: Auto-Snippet Mode (Europe PMC) - FASTEST

**Use for**: Exploratory queries with 3-5 specific terms

```python
results = EuropePMC_search_articles(
    query="bacterial antibiotic resistance evolution",
    limit=10,
    extract_terms_from_fulltext=["ciprofloxacin", "meropenem", "A. baumannii", "MIC"]
)

# Check which articles have full-text snippets
for article in results:
    if "fulltext_snippets" in article:
        # Snippets automatically extracted from OA full text
        for snippet in article["fulltext_snippets"]:
            # Use snippet["term"] and snippet["snippet"] for verification
            pass
```

**Advantages**:
- ✅ Single tool call (search + snippets)
- ✅ Bounded latency (max 3 OA articles, ~3-5 seconds total)
- ✅ No manual URL extraction
- ✅ Max 5 search terms

**Limitations**:
- ❌ Only works for OA articles with fullTextXML
- ❌ Limited to first 3 OA articles
- ❌ Europe PMC coverage only (~30-40% OA)

**When to use**: Initial exploration, quick verification of 1-2 papers

#### Tier 2: Manual Two-Step (Semantic Scholar, ArXiv) - TARGETED

**Use for**: Specific high-value papers you identified from search

```python
# Step 1: Search
papers = SemanticScholar_search_papers(
    query="machine learning interpretability",
    limit=10
)

# Step 2: Extract from specific OA papers
for paper in papers:
    if paper.get("open_access_pdf_url"):
        snippets = SemanticScholar_get_pdf_snippets(
            open_access_pdf_url=paper["open_access_pdf_url"],
            terms=["SHAP", "gradient attribution", "layer-wise relevance"],
            window_chars=300
        )
        if snippets["status"] == "success":
            # Process snippets["snippets"]
            pass
```

**ArXiv variant** (100% OA, no paywall):

```python
# All arXiv papers are freely available
snippets = ArXiv_get_pdf_snippets(
    arxiv_id="2301.12345",
    terms=["attention mechanism", "self-attention", "layer normalization"],
    max_snippets_per_term=5
)
```

**Advantages**:
- ✅ Full control over which papers to process
- ✅ Adjustable window size (20-2000 chars)
- ✅ Works for Semantic Scholar (~15-20% OA PDFs) and ArXiv (100%)
- ✅ Can process any number of papers

**Limitations**:
- ❌ Two tool calls per article (search → extract)
- ❌ Manual loop needed
- ❌ Slower than auto-snippet mode

**When to use**: Thorough review of key papers, preprint analysis

#### Tier 3: Manual Download + Parse (Fallback) - SLOWEST

**Use for**: Paywalled content via institutional access

```python
# For paywalled PDFs accessible via institution
webpage_text = get_webpage_text_from_url(
    url="https://doi.org/10.1016/...",
    # Requires institutional proxy or VPN
)

# Extract relevant sections manually
if "Methods" in webpage_text:
    # Parse methods section
    pass
```

**Limitations**:
- ❌ Requires institutional access
- ❌ No snippet extraction (full HTML)
- ❌ Quality varies by publisher
- ❌ Slowest approach

**When to use**: Last resort for critical paywalled papers

#### Decision Matrix

| Scenario | Recommended Tier | Rationale |
|----------|------------------|-----------|
| Quick verification ("Which antibiotic?") | Tier 1 (Auto-snippet) | Fast, single call |
| Preprint deep-dive (arXiv, bioRxiv) | Tier 2 (Manual ArXiv) | 100% coverage, no paywall |
| High-value paper deep analysis | Tier 2 (Manual S2) | Precise control |
| Systematic review (50+ papers) | Tier 1 + Tier 2 | Auto for OA, manual for key papers |
| Paywalled critical paper | Tier 3 (Manual download) | Only option |

#### Best Practices

**1. Limit search terms to 3-5 specific keywords**:
- ✅ Good: `["ciprofloxacin 5 μg/mL", "HEK293 cells", "RNA-seq"]`
- ❌ Bad: `["drug", "method", "significant"]` (too broad)

**2. Check OA status before extraction**:
```python
if article.get("open_access") and article.get("fulltext_xml_url"):
    # Proceed with extraction
    pass
```

**3. Adjust window size for context**:
- Methods: 400-500 chars (full sentences)
- Quick verification: 150-200 chars
- Default: 220 chars (balanced)

**4. Handle failures gracefully**:
```python
if "fulltext_snippets" not in article:
    # Fallback: use abstract or skip
    print(f"No full text available: {article['title']}")
```

**5. Document full-text sources in report**:
```markdown
## Methods Verification

**Antibiotic concentrations** (verified from full text):
- Study A: Ciprofloxacin 5 μg/mL [PMC12345, Methods section]
- Study B: Meropenem 8 μg/mL [arXiv:2301.12345, Experimental Design]

*Note: Full-text verification performed on 8/15 OA papers (53% coverage)*
```

### 2.5 Tool Failure Handling

**Automatic retry strategy**:
```
Attempt 1: Call tool
If timeout/error:
  Wait 2 seconds
  Attempt 2: Retry
If still fails:
  Wait 5 seconds  
  Attempt 3: Try fallback tool
If fallback fails:
  Document "Data unavailable" in report
```

**Fallback chains**:
| Primary Tool | Fallback 1 | Fallback 2 |
|--------------|------------|------------|
| `PubMed_get_cited_by` | `EuropePMC_get_citations` | OpenAlex citations |
| `PubMed_get_related` | SemanticScholar recommendations | Manual keyword search |
| `GTEx_get_median_gene_expression` | `HPA_get_rna_expression_by_source` | Document as unavailable |
| `Unpaywall_check_oa_status` | Europe PMC OA flags | OpenAlex OA field |

### 2.6 Open Access Handling (Best-Effort)

**If Unpaywall email provided**: Check OA status for all papers with DOIs

**If no Unpaywall email**: Use best-effort OA signals:
- Europe PMC: `isOpenAccess` field
- PMC: All PMC papers are OA
- OpenAlex: `is_oa` field
- DOAJ: All DOAJ papers are OA

**Label in report**:
```markdown
*OA Status: Best-effort (Unpaywall not configured)*
```

---

## Phase 3: Evidence Grading

**CRITICAL**: Grade every claim by evidence strength to prevent low-signal mentions from diluting the report.

### Evidence Tiers

| Tier | Label | Description | Example |
|------|-------|-------------|---------|
| **T1** | ★★★ Mechanistic | In-target mechanistic study with direct experimental evidence | CRISPR KO + rescue |
| **T2** | ★★☆ Functional | Functional study showing role (may be in pathway context) | siRNA knockdown phenotype |
| **T3** | ★☆☆ Association | Screen hit, GWAS association, correlation | High-throughput screen |
| **T4** | ☆☆☆ Mention | Review mention, text-mined interaction, peripheral reference | Review article |

### How to Apply

In report, label sections and claims:

```markdown
### Mechanism of Action

ATP6V1A is the catalytic subunit responsible for ATP hydrolysis in the V-ATPase 
complex [★★★ Mechanistic: PMID:12345678]. Loss-of-function mutations cause 
vacuolar pH dysregulation [★★★: PMID:23456789].

The target has been implicated in mTORC1 signaling through lysosomal amino acid 
sensing [★★☆ Functional: PMID:34567890], though direct interaction data is limited.

A genome-wide screen identified ATP6V1A as essential in cancer cell lines 
[★☆☆ Association: PMID:45678901, DepMap].
```

### Theme-Level Grading

For each theme section, summarize evidence quality:

```markdown
### 3.1 Lysosomal Acidification (12 papers)
**Evidence Quality**: Strong (8 mechanistic, 3 functional, 1 association)

[Theme content...]
```

---

## Report Structure: Mandatory Completeness Checklist

**CRITICAL**: This checklist/template applies to **Full Deep-Research Mode**. For **Factoid / Verification Mode**, use a short fact-check report (see Phase 0) and do not force the full 15-section template.

### Output Files

1. **`[topic]_report.md`** - Main narrative report (**Full Deep-Research Mode**)
2. **`[topic]_factcheck_report.md`** - Short verification report (**Factoid / Verification Mode**)
3. **`[topic]_bibliography.json`** - Full deduplicated bibliography (always created)
4. **`methods_appendix.md`** - Methodology details (ONLY if user requests)

### Report Template

```markdown
# [TARGET/TOPIC]: Comprehensive Research Report

*Generated: [Date]*
*Evidence cutoff: [Date]*
*Total unique papers: [N]*

---

## Executive Summary

[2-3 paragraphs synthesizing key findings across all sections]

**Bottom Line**: [One-sentence actionable conclusion]

---

## 1. Target Identity & Aliases
*[MANDATORY - even for non-target topics, clarify scope]*

### 1.1 Official Identifiers
[Table of IDs or scope definition]

### 1.2 Synonyms and Aliases  
[List all known names - critical for complete literature coverage]

### 1.3 Known Naming Collisions
[Document collisions and how they were handled]

---

## 2. Protein Architecture
*[MANDATORY for protein targets; state "N/A - not a protein target" otherwise]*

### 2.1 Domain Structure
[Table of domains with positions, InterPro IDs]

### 2.2 Isoforms
[List isoforms, functional differences if known]

### 2.3 Key Structural Features
[Active sites, binding sites, PTMs]

### 2.4 Available Structures
[PDB entries, AlphaFold availability]

---

## 3. Complexes & Interaction Partners
*[MANDATORY]*

### 3.1 Known Complexes
[List complexes the protein participates in]

### 3.2 Direct Interactors
[Table of top interactors with evidence type and scores]

### 3.3 Functional Interaction Network
[Describe network context]

---

## 4. Subcellular Localization
*[MANDATORY]*

[Table of locations with confidence levels and sources]

---

## 5. Expression Profile
*[MANDATORY]*

### 5.1 Tissue Expression
[Table of top tissues with TPM values]

### 5.2 Cell-Type Expression
[If single-cell data available]

### 5.3 Disease-Specific Expression
[Expression changes in disease contexts]

---

## 6. Core Mechanisms
*[MANDATORY - this is the heart of the report]*

### 6.1 Molecular Function
[What the protein does biochemically]
**Evidence Quality**: [Strong/Moderate/Limited]

### 6.2 Biological Role
[Role in cellular/organismal context]
**Evidence Quality**: [Strong/Moderate/Limited]

### 6.3 Key Pathways
[Pathway involvement with evidence grades]

### 6.4 Regulation
[How the target is regulated]

---

## 7. Model Organism Evidence
*[MANDATORY]*

### 7.1 Mouse Models
[Knockout/knockin phenotypes, if any]

### 7.2 Other Model Organisms
[Yeast, fly, zebrafish, worm data if relevant]

### 7.3 Cross-Species Conservation
[Conservation and functional studies]

---

## 8. Human Genetics & Variants
*[MANDATORY]*

### 8.1 Constraint Scores
[pLI, LOEUF, missense Z - with interpretation]

### 8.2 Disease-Associated Variants
[ClinVar pathogenic variants]

### 8.3 Population Variants
[gnomAD notable variants]

### 8.4 GWAS Associations
[Any GWAS hits for the locus]

---

## 9. Disease Links
*[MANDATORY - include evidence strength]*

### 9.1 Strong Evidence (Genetic + Functional)
[Diseases with causal evidence]

### 9.2 Moderate Evidence (Association + Mechanism)
[Diseases with supporting evidence]

### 9.3 Weak Evidence (Association Only)
[Diseases with correlation/association only]

### 9.4 Evidence Summary Table

| Disease | Evidence Type | Score | Key Papers | Grade |
|---------|---------------|-------|------------|-------|
| [Disease 1] | Genetic + Functional | 0.85 | PMID:xxx | ★★★ |
| [Disease 2] | GWAS + Expression | 0.45 | PMID:yyy | ★★☆ |

---

## 10. Pathogen Involvement
*[MANDATORY - state "None identified" if not applicable]*

### 10.1 Viral Interactions
[Any viral exploitation or targeting]

### 10.2 Bacterial Interactions
[Any bacterial relevance]

### 10.3 Host Defense Role
[Role in immune response if any]

---

## 11. Key Assays & Readouts
*[MANDATORY]*

### 11.1 Biochemical Assays
[Available assays for target activity]

### 11.2 Cellular Readouts
[Cell-based assays and phenotypes]

### 11.3 In Vivo Models
[Animal models and endpoints]

---

## 12. Research Themes
*[MANDATORY - structured theme extraction]*

### 12.1 [Theme 1 Name] (N papers)
**Evidence Quality**: [Strong/Moderate/Limited]
**Representative Papers**: [≥3 papers or state "insufficient"]

[Theme description with evidence-graded citations]

### 12.2 [Theme 2 Name] (N papers)
[Same structure]

[Continue for all themes - require ≥3 representative papers per theme, or state "limited evidence"]

---

## 13. Open Questions & Research Gaps
*[MANDATORY]*

### 13.1 Mechanistic Unknowns
[What we don't understand about the target]

### 13.2 Therapeutic Unknowns
[What we don't know for drug development]

### 13.3 Suggested Priority Questions
[Ranked list of important unanswered questions]

---

## 14. Biological Model & Testable Hypotheses
*[MANDATORY - synthesis section]*

### 14.1 Integrated Biological Model
[3-5 paragraph synthesis integrating all evidence into coherent model]

### 14.2 Testable Hypotheses

| # | Hypothesis | Perturbation | Readout | Expected Result | Priority |
|---|------------|--------------|---------|-----------------|----------|
| 1 | [Hypothesis] | [Experiment] | [Measure] | [Prediction] | HIGH |
| 2 | [Hypothesis] | [Experiment] | [Measure] | [Prediction] | HIGH |
| 3 | [Hypothesis] | [Experiment] | [Measure] | [Prediction] | MEDIUM |

### 14.3 Suggested Experiments
[Brief description of key experiments to test hypotheses]

---

## 15. Conclusions & Recommendations
*[MANDATORY]*

### 15.1 Key Takeaways
[Bullet points of most important findings]

### 15.2 Confidence Assessment
[Overall confidence in the findings: High/Medium/Low with justification]

### 15.3 Recommended Next Steps
[Prioritized action items]

---

## References

*[Summary reference list in report - full bibliography in separate file]*

### Key Papers (Must-Read)
1. [Citation with PMID] - [Why important] [Grade: ★★★]
2. ...

### By Theme
[Organized reference lists]

---

## Data Limitations

- [Any databases that failed or returned no data]
- [Any known gaps in coverage]
- [OA status method used]

*Full methodology available in methods_appendix.md upon request.*
```

---


---

> **Extended Reference**: For detailed tool tables, examples, and templates, read `REFERENCE.md` in this skill directory.
> The agent can access it via: `read skills/tooluniverse-literature-deep-research/REFERENCE.md`

Related Skills

tooluniverse

from Zaoqu-Liu/ScienceClaw

Router skill for ToolUniverse tasks. First checks if specialized tooluniverse skills (34+ skills covering disease/drug/target research, clinical decision support, genomics, epigenomics, chemical safety, systems biology, and more) can solve the problem, then falls back to general strategies for using 1400+ scientific tools. Covers tool discovery, multi-hop queries, comprehensive research workflows, disambiguation, evidence grading, and report generation. Use when users need to research any scientific topic, find biological data, or explore drug/target/disease relationships.

tooluniverse-variant-interpretation

from Zaoqu-Liu/ScienceClaw

Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.

tooluniverse-variant-analysis

from Zaoqu-Liu/ScienceClaw

Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.

tooluniverse-target-research

from Zaoqu-Liu/ScienceClaw

Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.

tooluniverse-systems-biology

from Zaoqu-Liu/ScienceClaw

Comprehensive systems biology and pathway analysis using multiple pathway databases (Reactome, KEGG, WikiPathways, Pathway Commons, BioModels). Performs pathway enrichment, protein-pathway mapping, keyword searches, and systems-level analysis. Use when analyzing gene sets, exploring biological pathways, or investigating systems-level biology.

tooluniverse-structural-variant-analysis

from Zaoqu-Liu/ScienceClaw

Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.

tooluniverse-statistical-modeling

from Zaoqu-Liu/ScienceClaw

Perform statistical modeling and regression analysis on biomedical datasets. Supports linear regression, logistic regression (binary/ordinal/multinomial), mixed-effects models, Cox proportional hazards survival analysis, Kaplan-Meier estimation, and comprehensive model diagnostics. Extracts odds ratios, hazard ratios, confidence intervals, p-values, and effect sizes. Designed to solve BixBench statistical reasoning questions involving clinical/experimental data. Use when asked to fit regression models, compute odds ratios, perform survival analysis, run statistical tests, or interpret model coefficients from provided data.

tooluniverse-spatial-transcriptomics

from Zaoqu-Liu/ScienceClaw

Analyze spatial transcriptomics data to map gene expression in tissue architecture. Supports 10x Visium, MERFISH, seqFISH, Slide-seq, and imaging-based platforms. Performs spatial clustering, domain identification, cell-cell proximity analysis, spatial gene expression patterns, tissue architecture mapping, and integration with single-cell data. Use when analyzing spatial transcriptomics datasets, studying tissue organization, identifying spatial expression patterns, mapping cell-cell interactions in tissue context, characterizing tumor microenvironment spatial structure, or integrating spatial and single-cell RNA-seq data for comprehensive tissue analysis.

tooluniverse-spatial-omics-analysis

from Zaoqu-Liu/ScienceClaw

Computational analysis framework for spatial multi-omics data integration. Given spatially variable genes (SVGs), spatial domain annotations, tissue type, and disease context from spatial transcriptomics/proteomics experiments (10x Visium, MERFISH, DBiTplus, SLIDE-seq, etc.), performs comprehensive biological interpretation including pathway enrichment, cell-cell interaction inference, druggable target identification, immune microenvironment characterization, and multi-modal integration. Produces a detailed markdown report with Spatial Omics Integration Score (0-100), domain-by-domain characterization, and validation recommendations. Uses 70+ ToolUniverse tools across 9 analysis phases. Use when users ask about spatial transcriptomics analysis, spatial omics interpretation, tissue heterogeneity, spatial gene expression patterns, tumor microenvironment mapping, tissue zonation, or cell-cell communication from spatial data.

tooluniverse-single-cell

from Zaoqu-Liu/ScienceClaw

Production-ready single-cell and expression matrix analysis using scanpy, anndata, and scipy. Performs scRNA-seq QC, normalization, PCA, UMAP, Leiden/Louvain clustering, differential expression (Wilcoxon, t-test, DESeq2), cell type annotation, per-cell-type statistical analysis, gene-expression correlation, batch correction (Harmony), trajectory inference, and cell-cell communication analysis. NEW: Analyzes ligand-receptor interactions between cell types using OmniPath (CellPhoneDB, CellChatDB), scores communication strength, identifies signaling cascades, and handles multi-subunit receptor complexes. Integrates with ToolUniverse gene annotation tools (HPA, Ensembl, MyGene, UniProt) and enrichment tools (gseapy, PANTHER, STRING). Supports h5ad, 10X, CSV/TSV count matrices, and pre-annotated datasets. Use when analyzing single-cell RNA-seq data, studying cell-cell interactions, performing cell type differential expression, computing gene-expression correlations by cell type, analyzing tumor-immune communication, or answering questions about scRNA-seq datasets.

tooluniverse-sequence-retrieval

from Zaoqu-Liu/ScienceClaw

Retrieves biological sequences (DNA, RNA, protein) from NCBI and ENA with gene disambiguation, accession type handling, and comprehensive sequence profiles. Creates detailed reports with sequence metadata, cross-database references, and download options. Use when users need nucleotide sequences, protein sequences, genome data, or mention GenBank, RefSeq, EMBL accessions.

tooluniverse-sdk

from Zaoqu-Liu/ScienceClaw

Build AI scientist systems using ToolUniverse Python SDK for scientific research. Use when users need to access 1000++ scientific tools through Python code, create scientific workflows, perform drug discovery, protein analysis, genomics analysis, literature research, or any computational biology task. Triggers include requests to use scientific tools programmatically, build research pipelines, analyze biological data, search literature, predict drug properties, or create AI-powered scientific workflows.