Codex

research-quality

Assess source quality using GRADE methodology

104 stars

Best use case

research-quality is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Assess source quality using GRADE methodology

Teams using research-quality should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/research-quality/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/research-quality/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/research-quality/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How research-quality Compares

Feature / Agentresearch-qualityStandard Approach
Platform SupportCodexLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Assess source quality using GRADE methodology

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Research Quality Command

Perform systematic GRADE evidence quality assessment on research sources.

## Instructions

When invoked, perform rigorous quality assessment:

1. **Load Source**
   - Accept REF-XXX identifier or file path
   - Load PDF and finding document
   - Extract frontmatter metadata
   - Determine source type and baseline quality

2. **Apply GRADE Framework**

   **Baseline Quality** (by source type):
   - Systematic review / Meta-analysis → HIGH
   - Randomized controlled trial → HIGH
   - Cohort study → MODERATE
   - Case-control study → MODERATE
   - Case series → LOW
   - Expert opinion → LOW
   - Preprint / Blog post → VERY LOW

   **Downgrade Factors** (each -1 level):
   - Risk of bias (study design flaws, conflicts of interest)
   - Inconsistency (conflicting results across studies)
   - Indirectness (different population, indirect comparisons)
   - Imprecision (small sample size, wide confidence intervals)
   - Publication bias (selective reporting)

   **Upgrade Factors** (each +1 level):
   - Large effect size (strong effect, dose-response relationship)
   - Confounding works against finding (makes result conservative)
   - Dose-response gradient present

3. **Calculate Final GRADE**
   ```
   Final GRADE = Baseline + Upgrades - Downgrades

   HIGH:      Strong confidence, unlikely to change with new evidence
   MODERATE:  Moderate confidence, may change with new evidence
   LOW:       Limited confidence, likely to change with new evidence
   VERY LOW:  Very uncertain, any estimate is very uncertain
   ```

4. **Generate Assessment Report**
   - Document baseline quality
   - List all downgrade/upgrade factors with justification
   - Calculate final GRADE level
   - Provide hedging language recommendations
   - Assess AIWG applicability

5. **Save Assessment**
   - Save to `.aiwg/research/quality-assessments/REF-XXX-assessment.yaml`
   - Update frontmatter in finding document if --update-frontmatter
   - Log in quality assessment index

6. **Check Existing Citations**
   - If --check-citations, scan corpus for citations of this source
   - Flag any violations (overclaiming beyond GRADE level)
   - Generate remediation suggestions

## Arguments

- `[ref-id or file-path]` - Source to assess (required)
- `--output [yaml|markdown]` - Output format (default: yaml)
- `--update-frontmatter` - Update finding document frontmatter with assessment
- `--check-citations` - Scan corpus for citation policy violations
- `--interactive` - Interactive assessment with prompts for each factor

## Examples

```bash
# Basic quality assessment
/research-quality REF-022

# Assessment with frontmatter update
/research-quality REF-022 --update-frontmatter

# Interactive assessment
/research-quality REF-022 --interactive

# Assessment with citation check
/research-quality REF-022 --check-citations --output markdown
```

## Expected Output

```
Assessing Quality: REF-022 - AutoGen
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 1: Determining baseline
  Source Type: arXiv preprint (later published in conference)
  Baseline Quality: MODERATE (conference paper)
  Note: Upgraded from VERY LOW due to peer review

Step 2: Applying GRADE framework

  Downgrade Factors:
    [✓] Risk of Bias: -0 (no significant bias detected)
        - Study design appropriate
        - No apparent conflicts of interest
        - Methodology clearly described

    [✓] Inconsistency: -0 (single study, no comparison)
        - No conflicting results to evaluate

    [!] Indirectness: -0 (directly applicable)
        - Population: Software development teams
        - Intervention: Multi-agent conversation framework
        - Direct relevance to AIWG agent orchestration

    [!] Imprecision: -1 (limited evaluation scope)
        - Small benchmark dataset
        - Limited real-world validation
        - No confidence intervals reported

    [✓] Publication Bias: -0 (mitigated)
        - Open preprint, full methodology disclosed
        - Negative results discussed

  Upgrade Factors:
    [!] Large Effect: +0 (moderate effect size)
        - Improvements shown but not exceptionally large
        - Effect sizes: 10-30% improvement range

    [✓] Dose-Response: +0 (not applicable)
        - No dose-response relationship to evaluate

    [✓] Confounding: +0 (no clear confounding against)

Step 3: Calculating final GRADE
  Baseline: MODERATE
  Downgrades: -1 (imprecision)
  Upgrades: +0
  ─────────────────────────
  Final GRADE: LOW

Step 4: Generating assessment report
  ✓ Assessment saved: .aiwg/research/quality-assessments/REF-022-assessment.yaml
  ✓ Frontmatter updated in finding document
  ✓ Quality index updated

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

GRADE Assessment: LOW
Confidence: Limited confidence in effect estimates

Appropriate Hedging Language:
  ✓ USE: "Limited evidence suggests...", "Preliminary findings indicate..."
  ✗ AVOID: "Research demonstrates...", "Evidence proves..."

Rationale:
  While AutoGen shows promising multi-agent collaboration patterns,
  the evidence base is limited to a single study with small-scale
  evaluation. Real-world effectiveness at scale requires further
  investigation.

AIWG Applicability:
  - Patterns are directly applicable to agent orchestration (HIGH)
  - Implementation risk is moderate due to limited production validation
  - Recommend: Pilot implementation with monitoring

Next Steps:
  1. Monitor for follow-up studies strengthening evidence base
  2. Plan validation studies within AIWG context
  3. Review citations of REF-022 in corpus: /research-quality REF-022 --check-citations
```

## Assessment YAML Output

```yaml
# .aiwg/research/quality-assessments/REF-022-assessment.yaml
ref_id: REF-022
assessment_date: "2026-02-03"
assessor: "quality-agent"

source_metadata:
  title: "AutoGen: Enabling Next-Gen LLM Applications..."
  source_type: peer_reviewed_conference
  year: 2023

grade_assessment:
  baseline: MODERATE
  baseline_rationale: "Peer-reviewed conference paper"

  downgrade_factors:
    - factor: imprecision
      severity: -1
      rationale: "Limited evaluation scope, small benchmarks"

  upgrade_factors: []

  final_grade: LOW
  confidence_statement: "Limited confidence in effect estimates"

hedging_language:
  appropriate:
    - "Limited evidence suggests"
    - "Preliminary findings indicate"
    - "Initial research shows"

  inappropriate:
    - "Research demonstrates"
    - "Evidence proves"
    - "Studies conclusively show"

aiwg_applicability:
  relevance: HIGH
  implementation_risk: MODERATE
  recommendation: "Pilot implementation with validation"

citation_guidance:
  template: |
    Research provides preliminary evidence for [claim]
    (@.aiwg/research/findings/REF-022-autogen.md), though
    broader validation is needed (GRADE: LOW).
```

## Citation Policy Integration

When --check-citations is used:

```
Checking citations of REF-022 across corpus...

Found 8 citations:

✓ COMPLIANT (5):
  - .aiwg/architecture/agent-orchestration-sad.md:142
    "Research suggests flexible conversation patterns..."
    Hedging: APPROPRIATE for LOW quality

  - .aiwg/requirements/UC-174-conversable-agent.md:23
    "Evidence indicates multi-agent collaboration is feasible..."
    Hedging: APPROPRIATE for LOW quality

✗ VIOLATIONS (3):
  - docs/agent-framework.md:78
    "Research demonstrates significant improvements..."
    Hedging: TOO STRONG for LOW quality
    Suggestion: Change to "Limited evidence suggests..."

  - .aiwg/architecture/adr-012-agent-protocol.md:45
    "Studies prove that conversation patterns enable..."
    Hedging: TOO STRONG for LOW quality
    Suggestion: Change to "Preliminary findings indicate..."

Remediation script generated:
  .aiwg/research/quality-assessments/REF-022-violations.sh
```

## Interactive Mode

When --interactive is used, prompts for each factor:

```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GRADE Assessment: REF-022 (Interactive Mode)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Baseline Quality: MODERATE (conference paper)

────────────────────────────────────────────────────────────────────
Factor 1: Risk of Bias
────────────────────────────────────────────────────────────────────
Evaluate study design quality, conflicts of interest, methodology clarity.

Downgrade by 1 level? [y/N]: n
Rationale: Study design appropriate, no COI detected

────────────────────────────────────────────────────────────────────
Factor 2: Inconsistency
────────────────────────────────────────────────────────────────────
Evaluate consistency across studies (if multiple).

Downgrade by 1 level? [y/N]: n
Rationale: Single study, no comparison available

[... continues for all factors ...]
```

## References

- @$AIWG_ROOT/agentic/code/frameworks/research-complete/agents/quality-agent.md - Quality Assessment Agent
- @$AIWG_ROOT/src/research/services/quality-service.ts - GRADE implementation
- @.aiwg/research/docs/grade-assessment-guide.md - GRADE methodology
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/research/quality-dimensions.yaml - Quality schema
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/citation-policy.md - Hedging language requirements

Related Skills

Quality Filtering

104
from jmagly/aiwg

Accept/reject logic and quality scoring heuristics for media content

research-workflow

104
from jmagly/aiwg

Execute multi-stage research workflows

Codex

research-status

104
from jmagly/aiwg

Show research corpus health and statistics

Codex

research-query

104
from jmagly/aiwg

Search the local research corpus, read matching findings, and synthesize an answer with inline citations to REF-XXX sources. The "query" operation for the research pipeline.

Codex

research-quality-audit

104
from jmagly/aiwg

Audit research corpus for shallow stubs, incomplete sections, missing source files, and doc depth issues. Detects docs written from abstracts rather than full papers and optionally auto-dispatches expansion agents.

Codex

research-provenance

104
from jmagly/aiwg

Query provenance chains and artifact relationships

Codex

research-lint

104
from jmagly/aiwg

Run the research corpus lint ruleset to detect structural and referential integrity issues — orphan notes, missing frontmatter, broken references, missing GRADE assessments.

Codex

research-gap

104
from jmagly/aiwg

Analyze gaps in research coverage

Codex

research-gap-detect

104
from jmagly/aiwg

Build the mutual citation graph, find connected components, identify isolated clusters, and optionally search for bridge candidates and file gap issues. Automates the manual cluster analysis workflow.

Codex

research-document

104
from jmagly/aiwg

Generate summaries and literature notes from research papers

Codex

research-discover

104
from jmagly/aiwg

Search for research papers across academic databases

Codex

research-cite

104
from jmagly/aiwg

Generate properly formatted citation from research corpus

Codex