AI Agent Skill HUB

Codex

research-quality

Assess source quality using GRADE methodology

104 stars

View on GitHub Installation ↓

Best use case

research-quality is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Assess source quality using GRADE methodology

Teams using research-quality should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/research-quality/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/research-quality/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/research-quality/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How research-quality Compares

Feature / Agent	research-quality	Standard Approach
Platform Support	Codex	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Assess source quality using GRADE methodology

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

AI Agent for YouTube Script Writing

Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.

SKILL.md Source

# Research Quality Command

Perform systematic GRADE evidence quality assessment on research sources.

## Instructions

When invoked, perform rigorous quality assessment:

1. **Load Source**
   - Accept REF-XXX identifier or file path
   - Load PDF and finding document
   - Extract frontmatter metadata
   - Determine source type and baseline quality

2. **Apply GRADE Framework**

   **Baseline Quality** (by source type):
   - Systematic review / Meta-analysis → HIGH
   - Randomized controlled trial → HIGH
   - Cohort study → MODERATE
   - Case-control study → MODERATE
   - Case series → LOW
   - Expert opinion → LOW
   - Preprint / Blog post → VERY LOW

   **Downgrade Factors** (each -1 level):
   - Risk of bias (study design flaws, conflicts of interest)
   - Inconsistency (conflicting results across studies)
   - Indirectness (different population, indirect comparisons)
   - Imprecision (small sample size, wide confidence intervals)
   - Publication bias (selective reporting)

   **Upgrade Factors** (each +1 level):
   - Large effect size (strong effect, dose-response relationship)
   - Confounding works against finding (makes result conservative)
   - Dose-response gradient present

3. **Calculate Final GRADE**
   ```
   Final GRADE = Baseline + Upgrades - Downgrades

   HIGH:      Strong confidence, unlikely to change with new evidence
   MODERATE:  Moderate confidence, may change with new evidence
   LOW:       Limited confidence, likely to change with new evidence
   VERY LOW:  Very uncertain, any estimate is very uncertain
   ```

4. **Generate Assessment Report**
   - Document baseline quality
   - List all downgrade/upgrade factors with justification
   - Calculate final GRADE level
   - Provide hedging language recommendations
   - Assess AIWG applicability

5. **Save Assessment**
   - Save to `.aiwg/research/quality-assessments/REF-XXX-assessment.yaml`
   - Update frontmatter in finding document if --update-frontmatter
   - Log in quality assessment index

6. **Check Existing Citations**
   - If --check-citations, scan corpus for citations of this source
   - Flag any violations (overclaiming beyond GRADE level)
   - Generate remediation suggestions

## Arguments

- `[ref-id or file-path]` - Source to assess (required)
- `--output [yaml|markdown]` - Output format (default: yaml)
- `--update-frontmatter` - Update finding document frontmatter with assessment
- `--check-citations` - Scan corpus for citation policy violations
- `--interactive` - Interactive assessment with prompts for each factor

## Examples

```bash
# Basic quality assessment
/research-quality REF-022

# Assessment with frontmatter update
/research-quality REF-022 --update-frontmatter

# Interactive assessment
/research-quality REF-022 --interactive

# Assessment with citation check
/research-quality REF-022 --check-citations --output markdown
```

## Expected Output

```
Assessing Quality: REF-022 - AutoGen
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 1: Determining baseline
  Source Type: arXiv preprint (later published in conference)
  Baseline Quality: MODERATE (conference paper)
  Note: Upgraded from VERY LOW due to peer review

Step 2: Applying GRADE framework

  Downgrade Factors:
    [✓] Risk of Bias: -0 (no significant bias detected)
        - Study design appropriate
        - No apparent conflicts of interest
        - Methodology clearly described

    [✓] Inconsistency: -0 (single study, no comparison)
        - No conflicting results to evaluate

    [!] Indirectness: -0 (directly applicable)
        - Population: Software development teams
        - Intervention: Multi-agent conversation framework
        - Direct relevance to AIWG agent orchestration

    [!] Imprecision: -1 (limited evaluation scope)
        - Small benchmark dataset
        - Limited real-world validation
        - No confidence intervals reported

    [✓] Publication Bias: -0 (mitigated)
        - Open preprint, full methodology disclosed
        - Negative results discussed

  Upgrade Factors:
    [!] Large Effect: +0 (moderate effect size)
        - Improvements shown but not exceptionally large
        - Effect sizes: 10-30% improvement range

    [✓] Dose-Response: +0 (not applicable)
        - No dose-response relationship to evaluate

    [✓] Confounding: +0 (no clear confounding against)

Step 3: Calculating final GRADE
  Baseline: MODERATE
  Downgrades: -1 (imprecision)
  Upgrades: +0
  ─────────────────────────
  Final GRADE: LOW

Step 4: Generating assessment report
  ✓ Assessment saved: .aiwg/research/quality-assessments/REF-022-assessment.yaml
  ✓ Frontmatter updated in finding document
  ✓ Quality index updated

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

GRADE Assessment: LOW
Confidence: Limited confidence in effect estimates

Appropriate Hedging Language:
  ✓ USE: "Limited evidence suggests...", "Preliminary findings indicate..."
  ✗ AVOID: "Research demonstrates...", "Evidence proves..."

Rationale:
  While AutoGen shows promising multi-agent collaboration patterns,
  the evidence base is limited to a single study with small-scale
  evaluation. Real-world effectiveness at scale requires further
  investigation.

AIWG Applicability:
  - Patterns are directly applicable to agent orchestration (HIGH)
  - Implementation risk is moderate due to limited production validation
  - Recommend: Pilot implementation with monitoring

Next Steps:
  1. Monitor for follow-up studies strengthening evidence base
  2. Plan validation studies within AIWG context
  3. Review citations of REF-022 in corpus: /research-quality REF-022 --check-citations
```

## Assessment YAML Output

```yaml
# .aiwg/research/quality-assessments/REF-022-assessment.yaml
ref_id: REF-022
assessment_date: "2026-02-03"
assessor: "quality-agent"

source_metadata:
  title: "AutoGen: Enabling Next-Gen LLM Applications..."
  source_type: peer_reviewed_conference
  year: 2023

grade_assessment:
  baseline: MODERATE
  baseline_rationale: "Peer-reviewed conference paper"

  downgrade_factors:
    - factor: imprecision
      severity: -1
      rationale: "Limited evaluation scope, small benchmarks"

  upgrade_factors: []

  final_grade: LOW
  confidence_statement: "Limited confidence in effect estimates"

hedging_language:
  appropriate:
    - "Limited evidence suggests"
    - "Preliminary findings indicate"
    - "Initial research shows"

  inappropriate:
    - "Research demonstrates"
    - "Evidence proves"
    - "Studies conclusively show"

aiwg_applicability:
  relevance: HIGH
  implementation_risk: MODERATE
  recommendation: "Pilot implementation with validation"

citation_guidance:
  template: |
    Research provides preliminary evidence for [claim]
    (@.aiwg/research/findings/REF-022-autogen.md), though
    broader validation is needed (GRADE: LOW).
```

## Citation Policy Integration

When --check-citations is used:

```
Checking citations of REF-022 across corpus...

Found 8 citations:

✓ COMPLIANT (5):
  - .aiwg/architecture/agent-orchestration-sad.md:142
    "Research suggests flexible conversation patterns..."
    Hedging: APPROPRIATE for LOW quality

  - .aiwg/requirements/UC-174-conversable-agent.md:23
    "Evidence indicates multi-agent collaboration is feasible..."
    Hedging: APPROPRIATE for LOW quality

✗ VIOLATIONS (3):
  - docs/agent-framework.md:78
    "Research demonstrates significant improvements..."
    Hedging: TOO STRONG for LOW quality
    Suggestion: Change to "Limited evidence suggests..."

  - .aiwg/architecture/adr-012-agent-protocol.md:45
    "Studies prove that conversation patterns enable..."
    Hedging: TOO STRONG for LOW quality
    Suggestion: Change to "Preliminary findings indicate..."

Remediation script generated:
  .aiwg/research/quality-assessments/REF-022-violations.sh
```

## Interactive Mode

When --interactive is used, prompts for each factor:

```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GRADE Assessment: REF-022 (Interactive Mode)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Baseline Quality: MODERATE (conference paper)

────────────────────────────────────────────────────────────────────
Factor 1: Risk of Bias
────────────────────────────────────────────────────────────────────
Evaluate study design quality, conflicts of interest, methodology clarity.

Downgrade by 1 level? [y/N]: n
Rationale: Study design appropriate, no COI detected

────────────────────────────────────────────────────────────────────
Factor 2: Inconsistency
────────────────────────────────────────────────────────────────────
Evaluate consistency across studies (if multiple).

Downgrade by 1 level? [y/N]: n
Rationale: Single study, no comparison available

[... continues for all factors ...]
```

## References

- @$AIWG_ROOT/agentic/code/frameworks/research-complete/agents/quality-agent.md - Quality Assessment Agent
- @$AIWG_ROOT/src/research/services/quality-service.ts - GRADE implementation
- @.aiwg/research/docs/grade-assessment-guide.md - GRADE methodology
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/research/quality-dimensions.yaml - Quality schema
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/citation-policy.md - Hedging language requirements

Related Skills

Quality Filtering

from jmagly/aiwg

Accept/reject logic and quality scoring heuristics for media content

research-workflow

from jmagly/aiwg

Execute multi-stage research workflows

research-status

from jmagly/aiwg

Show research corpus health and statistics

research-query

from jmagly/aiwg

Search the local research corpus, read matching findings, and synthesize an answer with inline citations to REF-XXX sources. The "query" operation for the research pipeline.

research-quality-audit

from jmagly/aiwg

Audit research corpus for shallow stubs, incomplete sections, missing source files, and doc depth issues. Detects docs written from abstracts rather than full papers and optionally auto-dispatches expansion agents.

research-provenance

from jmagly/aiwg

Query provenance chains and artifact relationships

research-lint

from jmagly/aiwg

Run the research corpus lint ruleset to detect structural and referential integrity issues — orphan notes, missing frontmatter, broken references, missing GRADE assessments.

research-gap

from jmagly/aiwg

Analyze gaps in research coverage

research-gap-detect

from jmagly/aiwg

Build the mutual citation graph, find connected components, identify isolated clusters, and optionally search for bridge candidates and file gap issues. Automates the manual cluster analysis workflow.

research-document

from jmagly/aiwg

Generate summaries and literature notes from research papers

research-discover

from jmagly/aiwg

Search for research papers across academic databases

research-cite

from jmagly/aiwg

Generate properly formatted citation from research corpus