evidence-level-ranker

Ranks papers by evidence strength and methodological quality so literature can be compared and prioritized for citation without confusing design labels, validation depth, and actual reliability.

53 stars

byaipoch

View on GitHub Installation ↓

Best use case

evidence-level-ranker is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Ranks papers by evidence strength and methodological quality so literature can be compared and prioritized for citation without confusing design labels, validation depth, and actual reliability.

Teams using evidence-level-ranker should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/evidence-level-ranker/SKILL.md --create-dirs "https://raw.githubusercontent.com/aipoch/medical-research-skills/main/awesome-med-research-skills/Evidence Insight/evidence-level-ranker/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/evidence-level-ranker/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How evidence-level-ranker Compares

Feature / Agent	evidence-level-ranker	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Ranks papers by evidence strength and methodological quality so literature can be compared and prioritized for citation without confusing design labels, validation depth, and actual reliability.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

# Evidence Level Ranker | 证据等级排序器

## Task

Use this skill to rank papers by **evidence strength**, **methodological quality**, and **citation priority** within one explicit comparison framework.

This skill should identify what kind of evidence each paper provides, how much methodological trust it deserves, how much validation or corroboration it contains, and whether it should be treated as a **high-priority anchor citation**, **context-setting citation**, **mechanistic support citation**, or **low-priority / caution citation**.

This skill must not equate study design labels with true evidentiary value automatically. A meta-analysis is not automatically decisive, an RCT is not automatically well-conducted, a cohort is not automatically weak, and a mechanism study is not automatically non-informative. The skill must rank literature based on the combination of **design family, execution quality, validation depth, bias control, and claim discipline**.

This skill is especially useful when the user needs to:
- prioritize citations for a manuscript, review, protocol, or slide deck;
- compare reviews, observational studies, interventional studies, mechanism papers, omics studies, and validation studies in one framework;
- identify which papers are most suitable for supporting strong claims versus background framing;
- avoid treating flashy but fragile findings as top-tier evidence.

## Reference Module Integration

Use reference modules as execution dependencies, not decoration.

- `references/study-design-identification.md` must support study design labeling before ranking begins.
- `references/result-reliability-principles.md` must support reliability judgment for design quality, bias, statistics, and validation.
- `references/validation-chain-rules.md` must support judgment of internal validation, external validation, orthogonal confirmation, replication, and implementation relevance.
- `references/claim-discipline-rules.md` must support separation of what a paper shows versus what it claims.
- `references/literature-integrity-rules.md` must govern all citation handling, publication details, and evidence statements.

If the paper set includes mixed evidence families, this skill should explicitly use all relevant modules rather than collapsing all papers into one generic score.

## Input Validation

Before ranking, confirm what the user is actually asking to compare.

Required or strongly preferred inputs:
- one paper, a set of papers, or a literature shortlist;
- the disease, intervention, target, biomarker, exposure, or question of interest;
- the intended downstream use, if provided: background citation, key evidence citation, manuscript support, protocol support, clinical justification, mechanism support, etc.

If the input is incomplete, this skill should still proceed by ranking based on the available materials, but it must label major uncertainty sources explicitly.

This skill should distinguish between:
- ranking **papers on the same question**;
- ranking **mixed papers across evidence families**;
- ranking for **citation priority**, not for treatment recommendation or clinical decision-making.

## Sample Triggers

- “Help me rank these papers by evidence level.”
- “Which of these studies should I cite first?”
- “Compare this meta-analysis, this cohort, and this mechanism paper in one evidence framework.”
- “Which papers are strongest for supporting my claim?”
- “Please sort these studies by methodological strength and validation depth.”
- “I want to know which of these papers are anchor citations versus background citations.”

## Core Function

The core function of this skill is to convert a mixed literature set into a **transparent evidence ranking**, with each paper positioned on four linked but non-identical dimensions:

1. **Evidence Family** — what kind of study this is.
2. **Methodological Quality** — how well the study was actually executed.
3. **Validation / Corroboration Depth** — how much independent support or replication exists within the study or around it.
4. **Citation Priority** — how appropriate the paper is for strong support, contextual support, mechanistic support, or cautious mention.

This skill should rank papers comparatively, but it must also explain *why* each paper occupies its position. Rankings without explicit reasoning are incomplete.

## Execution

### Step 1 — Identify the comparison scope
Determine whether the papers address:
- the same core question;
- related but not identical questions;
- different evidence roles within one argument.

Do not force a false apples-to-apples comparison when papers are serving different evidentiary purposes.

### Step 2 — Identify the true study design for each paper
For each paper, identify the actual study design using methods, not just the authors’ self-description.

Separate:
- systematic review / meta-analysis;
- randomized trial / non-randomized intervention;
- prospective cohort / retrospective cohort / case-control / cross-sectional / registry / real-world evidence;
- diagnostic / prognostic / predictive / validation study;
- mechanism experiment / animal study / cell study / omics discovery study / computational study.

Do not confuse data source, assay type, model type, or platform type with study design.

### Step 3 — Judge methodological quality
Assess how strong the methods actually are.

Review at least these dimensions when relevant:
- sampling logic and cohort definition;
- inclusion / exclusion logic;
- confounding control and bias handling;
- sample size burden relative to analytical complexity;
- outcome definition and comparator appropriateness;
- statistical discipline, including multiplicity and model burden;
- calibration, robustness, sensitivity analysis, and missing-data handling when relevant;
- reproducibility and transparency of key methods.

A higher-level design should not be ranked highly if execution is weak.

### Step 4 — Judge validation and corroboration depth
Assess how much the results are supported beyond the initial finding.

Distinguish:
- no meaningful validation;
- internal split / resampling only;
- external validation cohort;
- orthogonal assay confirmation;
- independent replication;
- prospective or implementation-relevant confirmation.

Do not over-credit repeated analysis of closely related datasets as if it were independent validation.

### Step 5 — Judge claim discipline
Check whether the paper’s conclusions stay inside the evidence boundary.

Flag overclaim patterns such as:
- association presented as causation;
- retrospective performance presented as clinical utility;
- exploratory biomarker framed as established marker;
- mechanism signal framed as therapeutic proof;
- subgroup result framed as generalizable finding.

A paper with good methods but overextended conclusions should lose citation priority for strong claims.

### Step 6 — Assign evidence position and citation role
For each paper, assign all of the following:
- evidence family;
- methodological quality tier;
- validation depth tier;
- claim-discipline judgment;
- citation role.

Recommended citation roles:
- **Anchor citation** — strongest paper(s) for supporting a central claim.
- **High-value support citation** — strong support but not the single best anchor.
- **Context-setting citation** — useful for framing the topic or background.
- **Mechanistic support citation** — useful for biological rationale rather than direct clinical inference.
- **Caution citation** — cite only with explicit limitations.

### Step 7 — Produce the comparative ranking
Rank the papers explicitly and explain the ranking logic.

The ranking should reflect not only nominal evidence hierarchy, but also actual execution quality, validation strength, and claim appropriateness.

### Step 8 — State limitations of the ranking itself
Make explicit where the ranking is uncertainty-limited.

Examples:
- incomplete access to methods or supplementary material;
- unclear validation independence;
- mixed study purposes that reduce direct comparability;
- missing statistical detail;
- ambiguous endpoint or cohort definitions.

## Mandatory Output Structure

Use the following structure every time.

### A. Ranking Objective
State what is being ranked, for what question, and for what downstream use.

### B. Evidence Family Map
List each paper with its true study design / evidence family.

### C. Methodological Quality Review
For each paper, summarize the main strengths and weaknesses affecting trustworthiness.

### D. Validation and Corroboration Review
State what validation exists and how much confidence it adds.

### E. Claim Discipline Review
State whether the paper’s stated conclusions stay within the evidence boundary.

### F. Comparative Evidence Ranking
Provide a ranked list from strongest to weakest **for the stated purpose**, with clear reasoning.

### G. Citation Priority Recommendation
For each paper, assign one citation role:
- Anchor citation
- High-value support citation
- Context-setting citation
- Mechanistic support citation
- Caution citation

### H. Key Reasons for the Ranking
State the main factors that drove the order.

### I. Ranking Uncertainties and Caveats
Explain where incomplete information or mixed evidence roles limit certainty.

### J. References and Verification Notes
When the user provides or references specific papers, preserve verified bibliographic details accurately.
If any publication details, PMIDs, DOIs, trial identifiers, or validation claims cannot be verified from the provided material, mark them as **unverified** rather than guessing.

## Hard Rules

1. Always separate **study design label** from **true evidence value**.
2. Never rank papers by journal prestige, citation count, or narrative confidence alone.
3. Never treat statistical significance as equivalent to methodological reliability.
4. Never treat a nominally high-tier design as automatically top-ranked if execution is weak.
5. Always separate **internal validation**, **external validation**, **orthogonal confirmation**, and **independent replication**.
6. Never treat exploratory results as established evidence without appropriate validation.
7. Always distinguish **clinical evidence**, **observational evidence**, **mechanistic evidence**, and **omics / computational evidence**.
8. Never confuse a paper’s usefulness for biological rationale with its usefulness for supporting a strong clinical or causal claim.
9. Never overstate the meaning of subgroup findings, secondary analyses, or post hoc signals.
10. Always evaluate whether the paper’s conclusion language exceeds what the methods support.
11. Never fabricate references, PMIDs, DOIs, trial names, approval status, guideline status, or validation claims.
12. Never present vague memory or field lore as literature-backed fact.
13. If bibliographic or methodological details cannot be verified from the provided material, label them as **unverified**, **unclear**, or **not reported**.
14. Never invent missing sample sizes, model settings, validation cohorts, or effect estimates.
15. Do not collapse heterogeneous papers into a single ladder without explaining the comparison logic.
16. If two papers serve different evidence roles, state that explicitly rather than forcing a simplistic rank order.
17. Treat the output as incomplete if the reasoning behind the ranking is not transparent.

## What This Skill Should Not Do

This skill should not:
- act as a treatment recommendation engine;
- turn evidence ranking into clinical advice;
- assume meta-analysis always outranks all primary research automatically;
- assume mechanism work is low value in every context;
- replace full risk-of-bias appraisal frameworks when a formal systematic review standard is required;
- produce fake precision when the paper set is heterogeneous or incompletely reported.

## Quality Standard

A high-quality output from this skill should:
- correctly identify what each paper actually is;
- explain why some papers deserve stronger citation priority than others;
- separate evidence family, methodological quality, validation depth, and claim discipline clearly;
- avoid flattening mixed literature into a misleading single-number score;
- make the ranking usable for manuscript writing, literature review, protocol framing, or evidence mapping;
- make uncertainty explicit whenever details are missing or not verifiable.

Related Skills

real-world-evidence-study-designer

from aipoch/medical-research-skills

Designs a structured real-world evidence study using EHR, claims, or registry data, with explicit handling of time zero, eligibility windows, exposure definitions, outcome windows, censoring, confounding control, and target-trial-emulation logic. Use this skill when the user needs study-type design and protocol framing for an observational clinical study based on routine-care data. Do not invent database fields, follow-up completeness, linkage, coding validity, or causal identifiability.

topic-evidence-mapper

from aipoch/medical-research-skills

Rapidly maps the evidence landscape around a medical topic by organizing major research streams, target populations, endpoints, methods, evidence density, and thin areas. Always use this skill when a user needs a structured evidence map of a medical topic before deeper reading, gap analysis, or study planning. Do not treat evidence mapping as formal gap identification.

drug-target-evidence-landscape

from aipoch/medical-research-skills

Organizes the evidence and competitive landscape around a drug, target, or pathway by separating disease relevance, tractability, preclinical evidence, clinical evidence, modality fit, and crowding. Always map what is biologically supported, what is druggable, what has actually advanced, and what remains strategically open. Never confuse target relevance with druggability, preclinical activity with clinical promise, or narrative excitement with validated development maturity. Never fabricate references, trial status, approval status, company activity, or asset metadata.

disease-mechanism-evidence-map

from aipoch/medical-research-skills

Systematically maps mechanism evidence for a disease from molecules to pathways, cell types, tissues, biological consequences, and clinical phenotypes. Always use this skill when a user needs a layered mechanism evidence chain rather than a flat summary or immediate gap analysis. Formal literature citations must be real and verifiable.

skill-auditor

from aipoch/medical-research-skills

A comprehensive auditor for any agent skill — including Manus, OpenClaw/ClawHub, Claude, LobeHub, or custom SKILL.md-based skills. Use this skill whenever a user wants to evaluate, audit, review, score, or quality-check an agent skill before publishing, updating, or deploying. Covers two hard veto gates (structural redlines + research integrity redlines), static quality scoring across 25 criteria (ISO 25010 + OpenSSF + Agent), dynamic test input generation, multi-mode execution testing, multi-layer output evaluation with five specialized category rubrics (Evidence Insight / Protocol Design / Data Analysis / Academic Writing / Other), a Research Veto that applies to all four research categories, human eval viewer generation, actionable P0/P1/P2 optimization recommendations, and automatic skill improvement that outputs a polished, production-ready SKILL.md. Also use whenever a user says "audit my skill", "evaluate my skill", "improve my skill", or wants a corrected version after evaluation.

two-sample-mr-research-planner

from aipoch/medical-research-skills

Generates complete two-sample Mendelian randomization (MR) research designs from a user-provided research direction. Use when users want to design, plan, or build a study using two-sample MR to test causal relationships. Triggers:"design a two-sample MR study", "build a publishable MR paper", "test whether this biomarker causally affects this disease", "generate Lite/Standard/Advanced MR plans", "screen multiple exposures with MR", "bidirectional MR design", "causal inference using GWAS summary statistics", or "I want to study X and Y using MR". Always outputs four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.

research-proposal-generator

from aipoch/medical-research-skills

Generates a comprehensive research proposal design based on input literature, including hypothesis, mechanism verification, and budget. Use when the user wants to design a research project from a paper.

research-grants

from aipoch/medical-research-skills

Write competitive research proposals for NSF, NIH, DOE, DARPA, and Taiwan's NSTC when you need agency-compliant narratives, budgets, and review-criteria alignment for a specific solicitation/FOA/BAA.

protocol-standardization

from aipoch/medical-research-skills

Standardize fragmented experimental steps into reproducible protocol documents when you need method organization, lab SOP drafting, or cross-operator reproducibility; missing parameters must be explicitly marked as "To be supplemented/Not provided".

prospero-registration-helper

from aipoch/medical-research-skills

Assists researchers in generating PROSPERO registration content for meta-analyses from a title and optional protocol. Use when the user wants to draft a PROSPERO registration form.

non-tumor-ml-research-planner

from aipoch/medical-research-skills

Generates complete non-tumor biomedical machine learning research designs from a user-provided research direction. Always use this skill when users want to plan bioinformatics + ML papers for non-cancer diseases (metabolic, cardiovascular, kidney, inflammatory, autoimmune, infectious, neurological, endocrine, wound healing, chronic multifactor), design diagnostic biomarker studies, combine GEO datasets with feature selection and ML modeling, or generate Lite/Standard/Advanced/Publication+ workload plans. Trigger for:"non-tumor ML study", "bioinformatics paper outside oncology", "key genes and diagnostic model for a disease", "pyroptosis/ferroptosis/senescence/autophagy + disease", "GEO datasets + machine learning", "RF + LASSO diagnostic model", "DEG + feature selection + validation", "immune infiltration + biomarker", "non-cancer biomarker paper". Trigger even for casual phrasings like "I want to study X using machine learning", "help me design a non-tumor bioinformatics paper", or "how do I build a diagnostic model for disease Y".

network-tox-docking-research-planner

from aipoch/medical-research-skills

Generates complete network toxicology + molecular docking research designs from a user-provided toxicant and disease/phenotype. Always use this skill when users want to investigate how an environmental toxicant, endocrine disruptor, heavy metal, food contaminant, pharmaceutical residue, or consumer product chemical may contribute to a disease through shared molecular targets, hub genes, pathways, and docking evidence. Trigger for:"network toxicology study", "toxicology mechanism paper", "target prediction + PPI + docking", "environmental pollutant and disease mechanism", "hub genes and docking for toxicant", "Lite/Standard/Advanced toxicology plan", "CTD + SwissTargetPrediction + GeneCards + STRING", "CB-Dock2 docking study", "triclosan/BPA/cadmium/PFAS + disease". Also triggers for Chinese phrasings:"网络毒理学研究设计"、"毒物机制论文"、"靶点预测+PPI+对接"、"环境污染物与疾病机制". Trigger even for casual phrasings like "I want to study how chemical X affects disease Y" or "help me design a toxicology paper". Always output four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.