literature-matrix

Systematic research idea discovery through paper combination matrix. Use when finding research ideas, evaluating paper combinations, building unified theoretical frameworks, or generating code skeletons from combined methods.

1,174 stars

byforyourhealth111-pixel

View on GitHub Installation ↓

Best use case

literature-matrix is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using literature-matrix should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/literature-matrix/SKILL.md --create-dirs "https://raw.githubusercontent.com/foryourhealth111-pixel/Vibe-Skills/main/bundled/skills/literature-matrix/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/literature-matrix/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How literature-matrix Compares

Feature / Agent	literature-matrix	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

SKILL.md Source

# literature-matrix

Systematic research idea discovery: collect N papers, evaluate all N×(N-1)/2 combinations via a 5-dimension scoring matrix, deep-analyze top candidates with full-text evidence, build unified theoretical frameworks, and generate code skeletons.

## When to Use This Skill

Trigger when any of these applies:
- User needs to systematically discover research ideas from literature
- User wants to evaluate combination potential between multiple papers
- User wants to build a unified theoretical framework (αA+(1-α)B) from two methods
- User needs to generate code skeletons for combined methods
- User mentions: "文献矩阵", "论文组合", "找idea", "组合创新", "paper matrix"
- User invokes `/literature-matrix`

## Not For / Boundaries

**Will NOT:**
- Make final research decisions for the user (provides analysis and suggestions only)
- Guarantee any idea will be published (evaluates feasibility only)
- Bypass copyright to obtain paywalled papers (uses legal open-access channels only)
- Generate complete papers (provides framework drafts and code skeletons only)
- Fabricate data or analysis results
- Replace domain expert judgment on theoretical correctness

**Required inputs (ask if missing):**
1. Research domain and keywords
2. Time range (default: last 2 years)
3. Paper count (default: 40)

## Quick Reference

### Workflow (6 Phases)

```
Phase 0: Init → Phase 1: Collect Papers → Phase 2: Build Matrix → Phase 3: Deep Analysis → Phase 4: Framework → Phase 5: Code
  ↑                                                                                                                         |
  └─────────────────────────── Checkpoint resume (pause/resume at any phase) ───────────────────────────────────────────────┘
```

### Pattern 1: Initialize Session
```
1. Check ./paper_matrix/checkpoint.json for existing progress
2. Ask: domain, keywords, timerange, paper count, source mode, weight preset
3. Create directory: ./paper_matrix/{papers,analysis,ideas,frameworks,code}/
4. Save checkpoint
```

### Pattern 2: Paper Search (Semantic Scholar API)
```
GET https://api.semanticscholar.org/graph/v1/paper/search
  ?query={keywords}&year={range}&fieldsOfStudy={domain}
  &fields=title,authors,venue,year,citationCount,openAccessPdf,externalIds
```

### Pattern 3: Paper Screening Criteria
```
Each paper scored on 4 criteria:
✅ Open-source (GitHub repo exists)
✅ Accessible (clear method description)
✅ Trending (high citation velocity)
✅ Recognized (top venue: oral/spotlight)
```

### Pattern 4: 5-Dimension Evaluation (per combination)
```
| Dimension          | Default Weight | What it measures                    |
|--------------------|---------------|-------------------------------------|
| Complementarity    | 0.25          | A's method solves B's limitation?   |
| Data Compatibility | 0.20          | Shared data types/formats?          |
| Theory Unifiability| 0.20          | Natural unified framework exists?   |
| Innovation Delta   | 0.20          | 1+1>2 effect?                       |
| Implementation     | 0.15          | Code integration difficulty?        |

Weight presets:
- 理论导向: 0.20, 0.15, 0.30, 0.25, 0.10
- 工程导向: 0.25, 0.25, 0.10, 0.15, 0.25
- 快速发表: 0.30, 0.20, 0.15, 0.20, 0.15
- 自定义: user specifies all 5 weights
```

### Pattern 5: Three-Layer Filtering
```
Layer 1 (Rule): Exclude same-author, same-subfield, already-cited pairs → ~50% removed
Layer 2 (AI):   Score remaining pairs on 5 dimensions via abstracts → rank by weighted sum
Layer 3 (User): Discuss top-30 with user → narrow to 15-20 candidates
```

### Pattern 6: Paper Acquisition (3 Levels)
```
L1 Auto:   arXiv PDF → PMC → Unpaywall → Semantic Scholar openAccessPdf
L2 Assist: Provide DOI + download path, ask user to fetch via library
L3 Fallback: Abstract-only analysis, mark as ⚠️ low confidence
```

### Pattern 7: Combination Types
```
Parallel:  f(x) = α·A(x) + (1-α)·B(x)         → convex combination
Serial:    f(x) = B(A(x))                        → pipeline framework
Nested:    f(x) = A(x; module=B)                  → modular architecture
Extension: f(x) = α·A + β·B + (1-α-β)·C          → simplex constraint
```

### Pattern 8: Non-trivial Justification Templates
```
Theoretical:  interaction term α(1-α)·h(A,B) exists
Experimental: performance at α∈(0,1) exceeds linear interpolation
Problem:      A+B solves what neither A nor B can alone
Computational: combination requires novel optimization
```

### Pattern 9: Provenance Tagging
```
L1 Metadata: [来源: API元数据]              → high confidence
L2 Content:  [来源: 论文全文, Section X]     → medium-high confidence
L3 Inference:[推断: 基于[来源], 置信度: X]   → low-medium confidence
```

### Pattern 10: Checkpoint Save/Resume
```json
{"version":"1.0", "current_phase":2, "config":{...},
 "phase_0":{"status":"completed"},
 "phase_2":{"status":"in_progress","evaluated":450,"total":780}}
```

## Rules & Constraints

### MUST
- Attach a traceable link (Semantic Scholar/DOI/arXiv/PubMed) to every paper reference
- Tag every analytical conclusion with provenance level (L1/L2/L3) and confidence
- Save checkpoint after each phase completion
- Use Socratic dialogue: ask guiding questions, don't just present conclusions
- Proactively acquire papers when top candidates are identified
- Mark abstract-only analyses with ⚠️ low confidence warning

### SHOULD
- Use parallel Task agents to evaluate multiple combinations concurrently
- Generate heatmap visualization for the scoring matrix
- Suggest A+B+C extensions when A+B alone may lack novelty
- Link findings to user's existing project when project context is available
- Provide weight preset recommendations based on user's stated goals

### NEVER
- Present AI inference as established fact without provenance tag
- Skip user confirmation when narrowing candidates
- Attempt to download paywalled papers through unauthorized channels
- Generate a complete paper (only framework drafts and code skeletons)
- Omit source links from any paper reference

## Role: Socratic Research Mentor

Act as a proactive, patient, rigorous research mentor throughout the entire workflow.

**Behavioral principles:**
- Proactive: Don't wait for user questions. Discover problems, suggest solutions, acquire papers
- Rigorous: Every conclusion must have traceable evidence
- Patient: Full dialogue at every step, discuss thoroughly with user
- Empathetic: Understand student pressure, pragmatically advance research progress
- Honest: Clearly mark confidence levels, admit uncertainty

**Dialogue patterns by phase:**
- Discovery (Phase 1-2): Open-ended guidance — "I noticed Paper A's method and Paper B's limitation have potential complementarity. Does this make sense in your research context?"
- Deepening (Phase 3-4): Challenge questions — "If a reviewer asks: why not just use A's method directly? How would you respond?"
- Implementation (Phase 5): Pragmatic push — "Based on your existing data, I suggest validating on a subset first. Shall I generate the experiment code?"

See `references/dialogue-templates.md` for complete dialogue examples.

## Examples

### Example 1: Bioinformatics Multi-omics (Full Auto Search)

- **Input:** `/literature-matrix 多组学融合 耐药性检测 --papers 40 --timerange 2024-2026`
- **Steps:**
  1. Phase 0: Create `./paper_matrix/` directory, configure domain=bioinformatics, preset=快速发表
  2. Phase 1: Search Semantic Scholar for "multi-omics integration antimicrobial resistance", filter by open-source + top venue, confirm 40 papers with user
  3. Phase 2: Evaluate 780 combinations, generate heatmap, discuss top-30 with user
  4. Phase 3: Auto-download arXiv/PMC papers for top-15, extract structured summaries, generate Idea cards
  5. Phase 4: For selected idea (e.g., "graph attention + lipid profiling"), build unified framework: f(x) = α·GAT(x) + (1-α)·LipidNet(x), prove both are special cases
  6. Phase 5: Generate `base_framework.py`, `experiment.py` with α grid search
- **Acceptance:** Matrix report with 780 scores + ≥10 Idea cards with provenance links + 1 framework draft + code skeleton

### Example 2: ML Top Conference (Seed Expansion)

- **Input:** User provides 8 seed papers from NeurIPS 2025 oral presentations
- **Steps:**
  1. Phase 0: Configure source_mode=seed_expansion, domain=ML
  2. Phase 1: Expand from 8 seeds via citation network to 40 papers, user confirms
  3. Phase 2: Build matrix with 理论导向 weights, filter and rank
  4. Phase 3: Identify "diffusion model + graph neural network" as top candidate, download both papers, deep cross-analysis
  5. Phase 4: Build framework where diffusion and GNN are special cases of a "generative message-passing" framework
  6. Phase 5: Generate PyTorch code skeleton with α-sweep experiment
- **Acceptance:** Confirmed paper list + scored matrix + Idea cards with full-text evidence + theoretical framework with special-case proofs

### Example 3: Resume from Checkpoint

- **Input:** `/literature-matrix --resume`
- **Steps:**
  1. Read `./paper_matrix/checkpoint.json`: Phase 2 in progress, 450/780 evaluated
  2. Display progress: "检测到上次分析进度。Phase 2矩阵构建中，已评估450/780个组合。是否继续？"
  3. User confirms → continue evaluating remaining 330 combinations
  4. Complete Phase 2, proceed to Phase 3
- **Acceptance:** Seamless continuation from checkpoint, no duplicate work

### Example 4: Project-Linked Analysis

- **Input:** `/literature-matrix 脂质组学 机器学习 --link-project`
- **Steps:**
  1. Phase 0: Read CLAUDE.md, detect ECC multi-omics project context
  2. Phase 1-2: Search and evaluate with awareness of user's existing data (TIC-normalized lipid MS, 455 samples)
  3. Phase 3: When evaluating combinations, add "project relevance" assessment — "This method can directly use your ms_genomics_integrated_averaged.csv"
  4. Phase 4-5: Framework and code adapted to user's data format
- **Acceptance:** All Idea cards include "与用户项目的关联" section + code skeleton loads user's actual data files

## Troubleshooting

| Symptom | Diagnosis | Fix |
|---------|-----------|-----|
| Semantic Scholar API returns empty | Keywords too specific or API rate limit | Broaden keywords, add retry with backoff |
| Too few open-access papers | Domain has low OA rate | Use L2 acquisition (ask user to download), expand time range |
| All combinations score low | Papers too similar or too different | Adjust paper selection: mix methods papers with application papers |
| Checkpoint corrupted | Interrupted during write | Delete checkpoint.json, restart from Phase 0 |
| α=0.5 not optimal | Combination is serial, not parallel | Switch to pipeline framework (serial type), not convex combination |

## References

Detailed implementation guides:
- `references/index.md` — Navigation hub
- `references/workflow-phases.md` — Complete Phase 0-5 behavioral instructions
- `references/evaluation-system.md` — 5-dimension scoring, weight presets, prompt templates
- `references/paper-acquisition.md` — 3-level acquisition strategy with API details
- `references/theoretical-framework.md` — Combination types, non-trivial templates, α analysis
- `references/provenance-system.md` — 3-layer tracing, confidence levels, link requirements
- `references/checkpoint-system.md` — JSON schema, resume flow, error recovery
- `references/dialogue-templates.md` — Socratic dialogue examples per phase
- `references/output-templates.md` — Idea card, framework draft, code skeleton templates

## Maintenance

- Sources: Brainstorming session requirements (see `paper_matrix/REQUIREMENTS.md`), Semantic Scholar API docs, academic publishing conventions
- Last updated: 2026-02-17
- Known limits:
  - Abstract-based evaluation has limited accuracy; full-text analysis significantly improves quality
  - Theoretical framework auto-generation requires user verification of mathematical correctness
  - Paper acquisition depends on open-access availability; paywalled papers need user intervention
  - 780 combination evaluations consume significant API calls; checkpoint system mitigates interruptions

Related Skills

literature-review

1174

from foryourhealth111-pixel/Vibe-Skills

Conduct comprehensive, systematic literature reviews using multiple academic databases (PubMed, arXiv, bioRxiv, Semantic Scholar, etc.). This skill should be used when conducting systematic literature reviews, meta-analyses, research synthesis, or comprehensive literature searches across biomedical, scientific, and technical domains. Creates professionally formatted markdown documents and PDFs with verified citations in multiple citation styles (APA, Nature, Vancouver, etc.).

confusion-matrix-generator

1174

from foryourhealth111-pixel/Vibe-Skills

Confusion Matrix Generator - Auto-activating skill for ML Training. Triggers on: confusion matrix generator, confusion matrix generator Part of the ML Training skill category.

zinc-database

1174

from foryourhealth111-pixel/Vibe-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

1174

from foryourhealth111-pixel/Vibe-Skills

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

yeet

1174

from foryourhealth111-pixel/Vibe-Skills

Use only when the user explicitly asks to stage, commit, push, and open a GitHub pull request in one flow using the GitHub CLI (`gh`).

xlsx

1174

from foryourhealth111-pixel/Vibe-Skills

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

xan

1174

from foryourhealth111-pixel/Vibe-Skills

High-performance CSV processing with xan CLI for large tabular datasets, streaming transformations, and low-memory pipelines.

writing-plans

1174

from foryourhealth111-pixel/Vibe-Skills

Use when you have a spec or requirements for a multi-step task, before touching code

writing-docs

1174

from foryourhealth111-pixel/Vibe-Skills

Guides for writing and editing Remotion documentation. Use when adding docs pages, editing MDX files in packages/docs, or writing documentation content.

windows-hook-debugging

1174

from foryourhealth111-pixel/Vibe-Skills

Windows环境下Claude Code插件Hook执行错误的诊断与修复。当遇到hook error、cannot execute binary file、.sh regex误匹配、WSL/Git Bash冲突时使用。

weights-and-biases

1174

from foryourhealth111-pixel/Vibe-Skills

Track ML experiments with automatic logging, visualize training in real-time, optimize hyperparameters with sweeps, and manage model registry with W&B - collaborative MLOps platform

webthinker-deep-research

1174

from foryourhealth111-pixel/Vibe-Skills

Deep web research for VCO: multi-hop search+browse+extract with an auditable action trace and a structured report (WebThinker-style).