scholar-evaluation

Implements the ScholarEval framework to evaluate scholarly documents; trigger when the user provides a PDF/DOCX/TXT file or pasted text and requests critique, scoring, or quality assessment.

53 stars

byaipoch

View on GitHub Installation ↓

Best use case

scholar-evaluation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Implements the ScholarEval framework to evaluate scholarly documents; trigger when the user provides a PDF/DOCX/TXT file or pasted text and requests critique, scoring, or quality assessment.

Teams using scholar-evaluation should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/scholar-evaluation/SKILL.md --create-dirs "https://raw.githubusercontent.com/aipoch/medical-research-skills/main/scientific-skills/Evidence Insight/scholar-evaluation/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/scholar-evaluation/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How scholar-evaluation Compares

Feature / Agent	scholar-evaluation	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Implements the ScholarEval framework to evaluate scholarly documents; trigger when the user provides a PDF/DOCX/TXT file or pasted text and requests critique, scoring, or quality assessment.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- Evaluate a research paper, thesis, or proposal and produce a structured critique with scores.
- Generate actionable revision recommendations across core academic writing dimensions.
- Compare multiple drafts/versions of a manuscript using consistent rubric-based scoring.
- Assess submission readiness (e.g., for a conference/journal) and identify major weaknesses.
- Review a document provided as a PDF/DOCX/TXT file when the user expects automatic text extraction.

## Key Features

- **Automatic text extraction** from **PDF/DOCX/TXT** via `scripts/extract_text.py` (intended as the first step for file inputs).
- **ScholarEval rubric** with **8 evaluation dimensions** (see `references/evaluation_framework.md`).
- **Per-dimension scoring (1–5)** with qualitative feedback and concrete recommendations.
- **Weighted score calculation** via `scripts/calculate_scores.py` from a JSON score file.
- Produces a final report summarizing **strengths, weaknesses, and next steps**.

## Dependencies

- Python **3.10+**
- See `requirements.txt` for pinned Python package versions (install via `pip install -r requirements.txt`).

## Example Usage

### A) Evaluate a PDF/DOCX/TXT file (end-to-end)

1) Extract text (run this first for file inputs):
```bash
python scripts/extract_text.py "paper.pdf"
```

2) Create a scores JSON (example: `scores.json`):
```json
{
"problem_formulation": 4,
"literature_review": 3,
"methodology": 4,
"data_quality": 3,
"analysis": 4,
"results": 3,
"writing_quality": 4,
"citations": 3
}
```

3) Compute the weighted/aggregate score:
```bash
python scripts/calculate_scores.py --scores scores.json
```

4) Use the extracted text plus the rubric to generate the evaluation report:
- Apply the 8-dimension criteria from `references/evaluation_framework.md`
- Provide per-dimension justification, then summarize strengths/risks and prioritized revisions

### B) Evaluate pasted text (no extraction)

If the user pastes text directly (e.g., abstract, full paper text), skip extraction and evaluate immediately using the 8 dimensions and the 1–5 scale.

## Implementation Details

### File ingestion protocol (for PDF/DOCX/TXT)

- For any user-provided file, run:
```bash
python scripts/extract_text.py "<filename-or-path>"
```
- The extraction script is designed to locate the file even if the full path is not provided.
- Use the extracted plain text as the sole input to the evaluation rubric and scoring.

### Evaluation dimensions (8)

The framework evaluates:
1. Problem Formulation
2. Literature Review
3. Methodology
4. Data Quality
5. Analysis
6. Results
7. Writing Quality
8. Citations

Detailed criteria and guidance are defined in:
- `references/evaluation_framework.md`

### Scoring scale (1–5)

- **1 — Poor**: Major flaws; not usable as-is.
- **2 — Weak**: Significant issues; major revision required.
- **3 — Average**: Acceptable baseline; improvement needed.
- **4 — Good**: Strong overall; minor issues.
- **5 — Excellent**: High quality; clear impact and rigor.

### Score calculation

- Raw per-dimension scores are stored in a JSON file and passed to:
```bash
python scripts/calculate_scores.py --scores <path_to_scores_json>
```
- The script computes an aggregate score (and any configured weighting logic) based on the provided metrics.

Related Skills

semantic-scholar-database

from aipoch/medical-research-skills

Access the Semantic Scholar Graph API to search papers and retrieve paper/author/citation data when you need literature discovery or citation graph exploration.

skill-auditor

from aipoch/medical-research-skills

A comprehensive auditor for any agent skill — including Manus, OpenClaw/ClawHub, Claude, LobeHub, or custom SKILL.md-based skills. Use this skill whenever a user wants to evaluate, audit, review, score, or quality-check an agent skill before publishing, updating, or deploying. Covers two hard veto gates (structural redlines + research integrity redlines), static quality scoring across 25 criteria (ISO 25010 + OpenSSF + Agent), dynamic test input generation, multi-mode execution testing, multi-layer output evaluation with five specialized category rubrics (Evidence Insight / Protocol Design / Data Analysis / Academic Writing / Other), a Research Veto that applies to all four research categories, human eval viewer generation, actionable P0/P1/P2 optimization recommendations, and automatic skill improvement that outputs a polished, production-ready SKILL.md. Also use whenever a user says "audit my skill", "evaluate my skill", "improve my skill", or wants a corrected version after evaluation.

two-sample-mr-research-planner

from aipoch/medical-research-skills

Generates complete two-sample Mendelian randomization (MR) research designs from a user-provided research direction. Use when users want to design, plan, or build a study using two-sample MR to test causal relationships. Triggers:"design a two-sample MR study", "build a publishable MR paper", "test whether this biomarker causally affects this disease", "generate Lite/Standard/Advanced MR plans", "screen multiple exposures with MR", "bidirectional MR design", "causal inference using GWAS summary statistics", or "I want to study X and Y using MR". Always outputs four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.

research-proposal-generator

from aipoch/medical-research-skills

Generates a comprehensive research proposal design based on input literature, including hypothesis, mechanism verification, and budget. Use when the user wants to design a research project from a paper.

research-grants

from aipoch/medical-research-skills

Write competitive research proposals for NSF, NIH, DOE, DARPA, and Taiwan's NSTC when you need agency-compliant narratives, budgets, and review-criteria alignment for a specific solicitation/FOA/BAA.

protocol-standardization

from aipoch/medical-research-skills

Standardize fragmented experimental steps into reproducible protocol documents when you need method organization, lab SOP drafting, or cross-operator reproducibility; missing parameters must be explicitly marked as "To be supplemented/Not provided".

prospero-registration-helper

from aipoch/medical-research-skills

Assists researchers in generating PROSPERO registration content for meta-analyses from a title and optional protocol. Use when the user wants to draft a PROSPERO registration form.

non-tumor-ml-research-planner

from aipoch/medical-research-skills

Generates complete non-tumor biomedical machine learning research designs from a user-provided research direction. Always use this skill when users want to plan bioinformatics + ML papers for non-cancer diseases (metabolic, cardiovascular, kidney, inflammatory, autoimmune, infectious, neurological, endocrine, wound healing, chronic multifactor), design diagnostic biomarker studies, combine GEO datasets with feature selection and ML modeling, or generate Lite/Standard/Advanced/Publication+ workload plans. Trigger for:"non-tumor ML study", "bioinformatics paper outside oncology", "key genes and diagnostic model for a disease", "pyroptosis/ferroptosis/senescence/autophagy + disease", "GEO datasets + machine learning", "RF + LASSO diagnostic model", "DEG + feature selection + validation", "immune infiltration + biomarker", "non-cancer biomarker paper". Trigger even for casual phrasings like "I want to study X using machine learning", "help me design a non-tumor bioinformatics paper", or "how do I build a diagnostic model for disease Y".

network-tox-docking-research-planner

from aipoch/medical-research-skills

Generates complete network toxicology + molecular docking research designs from a user-provided toxicant and disease/phenotype. Always use this skill when users want to investigate how an environmental toxicant, endocrine disruptor, heavy metal, food contaminant, pharmaceutical residue, or consumer product chemical may contribute to a disease through shared molecular targets, hub genes, pathways, and docking evidence. Trigger for:"network toxicology study", "toxicology mechanism paper", "target prediction + PPI + docking", "environmental pollutant and disease mechanism", "hub genes and docking for toxicant", "Lite/Standard/Advanced toxicology plan", "CTD + SwissTargetPrediction + GeneCards + STRING", "CB-Dock2 docking study", "triclosan/BPA/cadmium/PFAS + disease". Also triggers for Chinese phrasings:"网络毒理学研究设计"、"毒物机制论文"、"靶点预测+PPI+对接"、"环境污染物与疾病机制". Trigger even for casual phrasings like "I want to study how chemical X affects disease Y" or "help me design a toxicology paper". Always output four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.

meta-protocol-writer

from aipoch/medical-research-skills

Generates a PROSPERO-compliant Meta-analysis protocol based on Title and PICOS. Use when the user wants to write a protocol for a systematic review or meta-analysis.

hypothesis-generation

from aipoch/medical-research-skills

Structured scientific hypothesis formulation from observations; use when you have experimental observations or preliminary data and need testable hypotheses with predictions, mechanisms, and validation experiments.

hypogenic

from aipoch/medical-research-skills

Automated LLM-driven hypothesis generation and testing for tabular datasets; use when you need systematic exploration of empirical patterns (e.g., fraud detection, content analysis) and want to combine literature insights with data-driven hypothesis evaluation.