Best use case
research-quality is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
It is a strong fit for teams already working in Codex.
Assess source quality using GRADE methodology
Teams using research-quality should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/research-quality/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How research-quality Compares
| Feature / Agent | research-quality | Standard Approach |
|---|---|---|
| Platform Support | Codex | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Assess source quality using GRADE methodology
Which AI agents support this skill?
This skill is designed for Codex.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
AI Agents for Startups
Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.
AI Agent for YouTube Script Writing
Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.
SKILL.md Source
# Research Quality Command
Perform systematic GRADE evidence quality assessment on research sources.
## Instructions
When invoked, perform rigorous quality assessment:
1. **Load Source**
- Accept REF-XXX identifier or file path
- Load PDF and finding document
- Extract frontmatter metadata
- Determine source type and baseline quality
2. **Apply GRADE Framework**
**Baseline Quality** (by source type):
- Systematic review / Meta-analysis → HIGH
- Randomized controlled trial → HIGH
- Cohort study → MODERATE
- Case-control study → MODERATE
- Case series → LOW
- Expert opinion → LOW
- Preprint / Blog post → VERY LOW
**Downgrade Factors** (each -1 level):
- Risk of bias (study design flaws, conflicts of interest)
- Inconsistency (conflicting results across studies)
- Indirectness (different population, indirect comparisons)
- Imprecision (small sample size, wide confidence intervals)
- Publication bias (selective reporting)
**Upgrade Factors** (each +1 level):
- Large effect size (strong effect, dose-response relationship)
- Confounding works against finding (makes result conservative)
- Dose-response gradient present
3. **Calculate Final GRADE**
```
Final GRADE = Baseline + Upgrades - Downgrades
HIGH: Strong confidence, unlikely to change with new evidence
MODERATE: Moderate confidence, may change with new evidence
LOW: Limited confidence, likely to change with new evidence
VERY LOW: Very uncertain, any estimate is very uncertain
```
4. **Generate Assessment Report**
- Document baseline quality
- List all downgrade/upgrade factors with justification
- Calculate final GRADE level
- Provide hedging language recommendations
- Assess AIWG applicability
5. **Save Assessment**
- Save to `.aiwg/research/quality-assessments/REF-XXX-assessment.yaml`
- Update frontmatter in finding document if --update-frontmatter
- Log in quality assessment index
6. **Check Existing Citations**
- If --check-citations, scan corpus for citations of this source
- Flag any violations (overclaiming beyond GRADE level)
- Generate remediation suggestions
## Arguments
- `[ref-id or file-path]` - Source to assess (required)
- `--output [yaml|markdown]` - Output format (default: yaml)
- `--update-frontmatter` - Update finding document frontmatter with assessment
- `--check-citations` - Scan corpus for citation policy violations
- `--interactive` - Interactive assessment with prompts for each factor
## Examples
```bash
# Basic quality assessment
/research-quality REF-022
# Assessment with frontmatter update
/research-quality REF-022 --update-frontmatter
# Interactive assessment
/research-quality REF-022 --interactive
# Assessment with citation check
/research-quality REF-022 --check-citations --output markdown
```
## Expected Output
```
Assessing Quality: REF-022 - AutoGen
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 1: Determining baseline
Source Type: arXiv preprint (later published in conference)
Baseline Quality: MODERATE (conference paper)
Note: Upgraded from VERY LOW due to peer review
Step 2: Applying GRADE framework
Downgrade Factors:
[✓] Risk of Bias: -0 (no significant bias detected)
- Study design appropriate
- No apparent conflicts of interest
- Methodology clearly described
[✓] Inconsistency: -0 (single study, no comparison)
- No conflicting results to evaluate
[!] Indirectness: -0 (directly applicable)
- Population: Software development teams
- Intervention: Multi-agent conversation framework
- Direct relevance to AIWG agent orchestration
[!] Imprecision: -1 (limited evaluation scope)
- Small benchmark dataset
- Limited real-world validation
- No confidence intervals reported
[✓] Publication Bias: -0 (mitigated)
- Open preprint, full methodology disclosed
- Negative results discussed
Upgrade Factors:
[!] Large Effect: +0 (moderate effect size)
- Improvements shown but not exceptionally large
- Effect sizes: 10-30% improvement range
[✓] Dose-Response: +0 (not applicable)
- No dose-response relationship to evaluate
[✓] Confounding: +0 (no clear confounding against)
Step 3: Calculating final GRADE
Baseline: MODERATE
Downgrades: -1 (imprecision)
Upgrades: +0
─────────────────────────
Final GRADE: LOW
Step 4: Generating assessment report
✓ Assessment saved: .aiwg/research/quality-assessments/REF-022-assessment.yaml
✓ Frontmatter updated in finding document
✓ Quality index updated
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GRADE Assessment: LOW
Confidence: Limited confidence in effect estimates
Appropriate Hedging Language:
✓ USE: "Limited evidence suggests...", "Preliminary findings indicate..."
✗ AVOID: "Research demonstrates...", "Evidence proves..."
Rationale:
While AutoGen shows promising multi-agent collaboration patterns,
the evidence base is limited to a single study with small-scale
evaluation. Real-world effectiveness at scale requires further
investigation.
AIWG Applicability:
- Patterns are directly applicable to agent orchestration (HIGH)
- Implementation risk is moderate due to limited production validation
- Recommend: Pilot implementation with monitoring
Next Steps:
1. Monitor for follow-up studies strengthening evidence base
2. Plan validation studies within AIWG context
3. Review citations of REF-022 in corpus: /research-quality REF-022 --check-citations
```
## Assessment YAML Output
```yaml
# .aiwg/research/quality-assessments/REF-022-assessment.yaml
ref_id: REF-022
assessment_date: "2026-02-03"
assessor: "quality-agent"
source_metadata:
title: "AutoGen: Enabling Next-Gen LLM Applications..."
source_type: peer_reviewed_conference
year: 2023
grade_assessment:
baseline: MODERATE
baseline_rationale: "Peer-reviewed conference paper"
downgrade_factors:
- factor: imprecision
severity: -1
rationale: "Limited evaluation scope, small benchmarks"
upgrade_factors: []
final_grade: LOW
confidence_statement: "Limited confidence in effect estimates"
hedging_language:
appropriate:
- "Limited evidence suggests"
- "Preliminary findings indicate"
- "Initial research shows"
inappropriate:
- "Research demonstrates"
- "Evidence proves"
- "Studies conclusively show"
aiwg_applicability:
relevance: HIGH
implementation_risk: MODERATE
recommendation: "Pilot implementation with validation"
citation_guidance:
template: |
Research provides preliminary evidence for [claim]
(@.aiwg/research/findings/REF-022-autogen.md), though
broader validation is needed (GRADE: LOW).
```
## Citation Policy Integration
When --check-citations is used:
```
Checking citations of REF-022 across corpus...
Found 8 citations:
✓ COMPLIANT (5):
- .aiwg/architecture/agent-orchestration-sad.md:142
"Research suggests flexible conversation patterns..."
Hedging: APPROPRIATE for LOW quality
- .aiwg/requirements/UC-174-conversable-agent.md:23
"Evidence indicates multi-agent collaboration is feasible..."
Hedging: APPROPRIATE for LOW quality
✗ VIOLATIONS (3):
- docs/agent-framework.md:78
"Research demonstrates significant improvements..."
Hedging: TOO STRONG for LOW quality
Suggestion: Change to "Limited evidence suggests..."
- .aiwg/architecture/adr-012-agent-protocol.md:45
"Studies prove that conversation patterns enable..."
Hedging: TOO STRONG for LOW quality
Suggestion: Change to "Preliminary findings indicate..."
Remediation script generated:
.aiwg/research/quality-assessments/REF-022-violations.sh
```
## Interactive Mode
When --interactive is used, prompts for each factor:
```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GRADE Assessment: REF-022 (Interactive Mode)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Baseline Quality: MODERATE (conference paper)
────────────────────────────────────────────────────────────────────
Factor 1: Risk of Bias
────────────────────────────────────────────────────────────────────
Evaluate study design quality, conflicts of interest, methodology clarity.
Downgrade by 1 level? [y/N]: n
Rationale: Study design appropriate, no COI detected
────────────────────────────────────────────────────────────────────
Factor 2: Inconsistency
────────────────────────────────────────────────────────────────────
Evaluate consistency across studies (if multiple).
Downgrade by 1 level? [y/N]: n
Rationale: Single study, no comparison available
[... continues for all factors ...]
```
## References
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/agents/quality-agent.md - Quality Assessment Agent
- @$AIWG_ROOT/src/research/services/quality-service.ts - GRADE implementation
- @.aiwg/research/docs/grade-assessment-guide.md - GRADE methodology
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/research/quality-dimensions.yaml - Quality schema
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/citation-policy.md - Hedging language requirementsRelated Skills
Quality Filtering
Accept/reject logic and quality scoring heuristics for media content
research-workflow
Execute multi-stage research workflows
research-status
Show research corpus health and statistics
research-query
Search the local research corpus, read matching findings, and synthesize an answer with inline citations to REF-XXX sources. The "query" operation for the research pipeline.
research-quality-audit
Audit research corpus for shallow stubs, incomplete sections, missing source files, and doc depth issues. Detects docs written from abstracts rather than full papers and optionally auto-dispatches expansion agents.
research-provenance
Query provenance chains and artifact relationships
research-lint
Run the research corpus lint ruleset to detect structural and referential integrity issues — orphan notes, missing frontmatter, broken references, missing GRADE assessments.
research-gap
Analyze gaps in research coverage
research-gap-detect
Build the mutual citation graph, find connected components, identify isolated clusters, and optionally search for bridge candidates and file gap issues. Automates the manual cluster analysis workflow.
research-document
Generate summaries and literature notes from research papers
research-discover
Search for research papers across academic databases
research-cite
Generate properly formatted citation from research corpus