scienceclaw-verification

Verify scientific claims, check calculations, validate experimental designs, and fact-check citations. Use when: (1) checking a claim against evidence, (2) validating statistical analyses, (3) verifying experimental reproducibility claims, (4) fact-checking references, (5) adversarial review of research. NOT for: generating new content (use scienceclaw-generation), simple QA (use scienceclaw-qa).

564 stars

bybeita6969

View on GitHub Installation ↓

Best use case

scienceclaw-verification is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using scienceclaw-verification should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/scienceclaw-verification/SKILL.md --create-dirs "https://raw.githubusercontent.com/beita6969/ScienceClaw/main/skills/scienceclaw-verification/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/scienceclaw-verification/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How scienceclaw-verification Compares

Feature / Agent	scienceclaw-verification	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# scienceclaw-verification

Verify scientific claims, check calculations, validate experimental designs, and fact-check citations with structured, evidence-based assessment workflows.

## When to Use

- Checking whether a specific scientific claim is supported by cited evidence
- Validating statistical analyses, p-values, effect sizes, or confidence intervals
- Verifying that experimental designs are sound and controls are adequate
- Fact-checking references to confirm they exist and support the claims attributed to them
- Conducting adversarial review of a manuscript or research report
- Assessing reproducibility based on reported methods and data availability
- Cross-checking numerical results against raw data or supplementary materials

## When NOT to Use

- Generating new research content, hypotheses, or manuscripts -- use `scienceclaw-generation`
- Answering general scientific questions -- use `scienceclaw-qa`
- Summarizing papers or findings -- use `scienceclaw-summarization`
- Retrieving papers or building bibliographies -- use `scienceclaw-retrieval`
- Extracting structured information from papers -- use `scienceclaw-ie`

## Verification Workflow

Every verification task passes through four stages: Claim Decomposition, Evidence Search, Assessment, and Verdict Synthesis.

### Stage 1: Claim Decomposition

Break the target claim into atomic, independently verifiable sub-claims. For each, identify the core assertion, extract quantitative components (numbers, thresholds, comparisons), note causal vs. correlational language, list cited evidence, and record scope qualifiers.

### Stage 2: Evidence Search

For each sub-claim: check whether the cited reference actually contains the claimed information, seek independent corroboration from other sources, verify raw data if available, review methodology for sufficient detail, and check whether others have replicated the finding. Use `scienceclaw-retrieval` to locate papers and `scienceclaw-ie` to extract specific data points.

### Stage 3: Assessment

Assign one of the following verdicts to each sub-claim:

| Verdict | Definition |
|---|---|
| **SUPPORTED** | Evidence directly and clearly supports the claim |
| **PARTIALLY SUPPORTED** | Evidence supports the claim with caveats or qualifications |
| **UNVERIFIABLE** | Insufficient evidence available to confirm or deny |
| **CONTRADICTED** | Evidence directly contradicts the claim |
| **MISLEADING** | Claim is technically true but presented in a deceptive context |
| **CALCULATION ERROR** | Numerical result does not match when recomputed from available data |

### Stage 4: Verdict Synthesis

Combine sub-claim assessments into an overall verdict with confidence level (high/medium/low), an evidence summary citing key supporting and contradicting sources, an error catalog categorized by severity, and recommendations for corrections or further verification.

## Statistical Checking Patterns

Common statistical errors to check: p-value misinterpretation, unreported multiple comparison corrections (Bonferroni, FDR), underpowered studies claiming null results, missing effect sizes, confidence interval inconsistencies with reported estimates, incorrect degrees of freedom, violated distribution assumptions, and unacknowledged baseline imbalances in randomized trials.

Numerical consistency checks: verify table percentages sum correctly, confirm sub-analysis sample sizes are consistent with total N, validate that means and SDs are plausible for the measurement scale, ensure figures match text, and recompute derived statistics (odds ratios, hazard ratios) from raw counts when available. Apply GRIM tests (means possible given integer data and sample size) and SPRITE tests (summary stats consistent with plausible distributions).

## Adversarial Review Protocol

- **Methodology**: study design appropriateness, inclusion/exclusion criteria, control adequacy, randomization/blinding, confounder identification, measurement instrument validation
- **Results**: numerical consistency, effect size reporting, negative result inclusion, figure accuracy, outlier handling transparency
- **Citations**: reference existence verification (DOI check), claim-attribution accuracy, contrary findings acknowledgment, self-citation proportion
- **Logic**: conclusion-evidence alignment, generalization scope, alternative explanations, causal language justification

## Discipline-Specific Verification Criteria

- **Biomedical**: CONSORT/STROBE/PRISMA compliance, trial registration verification, IRB approval, conflict of interest disclosure, dose-response plausibility
- **Machine Learning**: benchmark version/split verification, hyperparameter tuning leakage, baseline comparison fairness, ablation completeness, reproducibility artifact availability
- **Physics**: unit consistency, dimensional analysis, order-of-magnitude plausibility, conservation law compliance, uncertainty propagation, calibration documentation
- **Social Sciences**: pre-registration verification (OSF), power analysis adequacy, effect size comparison with meta-analyses, demand characteristics controls, replication status of foundational claims

## Output Format

```
Verification Report
===================
Claim: [Original claim text]
Source: [Paper/report reference]
Date Verified: [Date]

Decomposed Sub-Claims:
  1. [Sub-claim] -- [VERDICT] -- [Brief justification]
  2. [Sub-claim] -- [VERDICT] -- [Brief justification]

Overall Verdict: [SUPPORTED | PARTIALLY SUPPORTED | UNVERIFIABLE | CONTRADICTED]
Confidence: [HIGH | MEDIUM | LOW]

Evidence Summary:
  Supporting: [Key supporting evidence with citations]
  Contradicting: [Key contradicting evidence with citations]

Errors Found:
  [Severity: CRITICAL | MAJOR | MINOR] -- [Description]

Recommendations:
  - [Suggested action items]
```

## Zero-Hallucination Rule

ALL factual claims, citations, database results, and scientific data presented to the user MUST come from actual tool results (API calls, code execution, web search) in this conversation. NEVER fabricate or "fill in" details from training data. If a tool returns no results or partial data, report exactly what happened.

Related Skills

scienceclaw-summarization

564

from beita6969/ScienceClaw

Summarize scientific papers, datasets, experimental results, and literature reviews. Use when: (1) condensing research papers, (2) creating literature reviews, (3) summarizing experimental findings, (4) meta-analysis synthesis, (5) creating executive summaries of research. NOT for: information extraction (use scienceclaw-ie), full paper retrieval (use scienceclaw-retrieval), or writing new content (use scienceclaw-generation).

scienceclaw-retrieval

564

from beita6969/ScienceClaw

Retrieve scientific information from databases, literature, and knowledge bases. Use when: (1) finding relevant papers, (2) querying scientific databases, (3) cross-referencing findings, (4) building bibliographies, (5) systematic literature search. NOT for: answering questions (use scienceclaw-qa), summarizing (use scienceclaw-summarization), or data analysis (use code-execution skill).

scienceclaw-reasoning

564

from beita6969/ScienceClaw

Perform multi-step scientific reasoning, proof construction, causal inference, and logical argumentation. Use when: (1) deriving conclusions from premises, (2) causal analysis, (3) mathematical proofs, (4) hypothesis evaluation, (5) counterfactual reasoning. NOT for: simple factual questions (use scienceclaw-qa), data analysis (use code-execution), or literature search (use scienceclaw-retrieval).

scienceclaw-qa

564

from beita6969/ScienceClaw

Answer scientific questions across all disciplines with evidence-based responses and citations. Use when: (1) user asks factual science questions, (2) needs explanation of concepts/theories/methods, (3) multi-step scientific reasoning needed. Covers natural sciences (physics, chemistry, biology, medicine, materials, astronomy, earth science, math, CS) and social sciences (economics, sociology, psychology, political science, linguistics, history, law, philosophy, education). NOT for: opinion-based questions, non-scientific queries, or when code execution is needed (use code-execution skill).

scienceclaw-prediction

564

from beita6969/ScienceClaw

Predict scientific properties, trends, and outcomes. Use when: user asks for property prediction, trend forecasting, or model-based estimation. NOT for: historical data lookup or real-time monitoring.

scienceclaw-ie

564

from beita6969/ScienceClaw

Extract structured information from scientific texts: entities, relations, data tables, methods, results. Use when: (1) parsing papers for key data, (2) extracting experimental parameters, (3) building knowledge graphs from literature, (4) NER on scientific documents, (5) extracting methods/results sections. NOT for: summarization (use scienceclaw-summarization), full text retrieval (use scienceclaw-retrieval).

scienceclaw-generation

564

from beita6969/ScienceClaw

Generate scientific hypotheses, experimental designs, and paper drafts. Use when: user asks to propose hypotheses, design experiments, or write scientific content. NOT for: data analysis or literature search.

scienceclaw-discovery

564

from beita6969/ScienceClaw

Identify research gaps, synthesize cross-disciplinary insights, and generate novel hypotheses. Use when: user asks about unexplored areas, cross-field connections, or new research directions. NOT for: routine literature review or data analysis.

scienceclaw-classification

564

from beita6969/ScienceClaw

Classify scientific content by discipline, methodology, topic, and quality. Use when: user asks to categorize papers, methods, or research outputs. NOT for: simple keyword tagging or non-scientific content.

fact-verification

564

from beita6969/ScienceClaw

Verify scientific claims, political statements, and environmental assertions against evidence

xurl

564

from beita6969/ScienceClaw

A CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.

xlsx

564

from beita6969/ScienceClaw

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.