AI Agent Skill HUB

Codex

research-gap-detect

Build the mutual citation graph, find connected components, identify isolated clusters, and optionally search for bridge candidates and file gap issues. Automates the manual cluster analysis workflow.

104 stars

View on GitHub Installation ↓

Best use case

research-gap-detect is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Build the mutual citation graph, find connected components, identify isolated clusters, and optionally search for bridge candidates and file gap issues. Automates the manual cluster analysis workflow.

Teams using research-gap-detect should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/research-gap-detect/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/research-gap-detect/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/research-gap-detect/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How research-gap-detect Compares

Feature / Agent	research-gap-detect	Standard Approach
Platform Support	Codex	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Build the mutual citation graph, find connected components, identify isolated clusters, and optionally search for bridge candidates and file gap issues. Automates the manual cluster analysis workflow.

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

Cursor vs Codex for AI Workflows

Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.

AI Agent for SaaS Idea Validation

Use AI agent skills for SaaS idea validation, market research, customer discovery, competitor analysis, and documenting startup hypotheses.

SKILL.md Source

# Research Gap Detect

Analyze the research corpus citation graph to find disconnected clusters, isolated papers, and gap opportunities. Optionally searches for bridge paper candidates and files gap issues.

## Triggers

- "find research gaps"
- "detect clusters"
- "cluster analysis"
- "find isolated papers"
- "bridge candidate search"
- `/research-gap-detect`

## Parameters

### `--clusters-only` (optional)
Only run cluster detection — skip bridge search and issue filing.

### `--file-issues` (optional)
Auto-file gap issues for each disconnected cluster pair.

### `--search-bridges` (optional)
Search external databases for papers that could bridge disconnected clusters.

### `--min-cluster-size N` (optional)
Minimum papers in a cluster to report. Default: 2.

### `--format` (optional)
Output format: `full` (default), `summary`, or `json`.

## Execution Flow

### Phase 1: Build Citation Graph

1. Read the citation-network index (from `/corpus-index-build --graph citation-network`)
   - If stale or missing: run `/corpus-index-build --graph citation-network` first
2. Build an adjacency list from outgoing + incoming edges
3. Treat as undirected for cluster detection (A cites B ≡ A connected to B)

### Phase 2: Connected Components (BFS)

Run BFS/connected-components on the undirected citation graph:

1. Initialize: all nodes unvisited
2. For each unvisited node: BFS to find its connected component
3. Collect components sorted by size (largest first)

**Output**:
```
Connected Components: 9

Cluster 1: "Agentic Workflows" (124 papers)
  Hub: REF-016 (34 connections)
  Topics: agentic-workflows, multi-agent, orchestration
  Sample: REF-001, REF-016, REF-024, REF-121 ...

Cluster 2: "GUI Agents" (31 papers)
  Hub: REF-198 (12 connections)
  Topics: gui-agents, web-agents, screen-understanding
  Sample: REF-198, REF-201, REF-215 ...

...

Cluster 9: "Isolated" (3 papers)
  No hub (all degree 1)
  REF-299, REF-312, REF-350
```

### Phase 3: Gap Analysis

For each pair of clusters, assess the gap:

1. **Topic overlap** — do the clusters share any tags?
2. **Temporal overlap** — do they cover the same years?
3. **Author overlap** — do any authors appear in both clusters?
4. **Bridgeability** — could a single paper connect them?

Prioritize gaps by:
- **Size product** — larger clusters disconnected = higher priority
- **Topic proximity** — clusters with related but not identical topics
- **Recency** — newer clusters may simply be missing recent cross-citations

**Output**:
```
Gap Analysis: 12 cluster pairs

Priority 1: "Agentic Workflows" ↔ "GUI Agents"
  Gap: 124 × 31 = 3,844 (size product)
  Topic overlap: agent, llm (2 shared tags)
  Bridge opportunity: HIGH
  Suggested search: "LLM agent GUI interaction orchestration"

Priority 2: "Evaluation" ↔ "Reproducibility"
  Gap: 45 × 28 = 1,260
  Topic overlap: evaluation, benchmark (2 shared tags)
  Bridge opportunity: MEDIUM
  Suggested search: "reproducible LLM evaluation benchmarks"
...
```

### Phase 4: Bridge Search (if --search-bridges)

For each high-priority gap:

1. Generate search queries from cluster topic overlap
2. Search external databases (Semantic Scholar, arXiv, Google Scholar)
3. Filter candidates by:
   - Cites papers from BOTH clusters
   - Published in overlapping time range
   - High citation count (likely to be connecting work)
4. Rank candidates by bridge potential

**Output**:
```
Bridge Candidates Found: 8

For gap "Agentic Workflows" ↔ "GUI Agents":
  1. "WebAgent: World-Centric Web Navigation" (2024)
     Cites: REF-016 (Cluster 1), REF-198 (Cluster 2)
     Citations: 87
     Bridge potential: HIGH

  2. "Agent-E: Vision-Language Planning for Web Tasks" (2024)
     Cites: REF-024 (Cluster 1), REF-201 (Cluster 2)
     Citations: 45
     Bridge potential: MEDIUM
```

### Phase 5: File Issues (if --file-issues)

For each gap with bridge candidates, file a research induction issue:

```markdown
## Research Gap: [Cluster A] ↔ [Cluster B]

**Gap Size**: [N × M papers disconnected]
**Bridge Candidates**: [list]
**Suggested Action**: Induct [top candidate] to connect clusters

### Bridge Papers to Induct
- [ ] "WebAgent: World-Centric Web Navigation" — arxiv:2401.XXXXX
- [ ] "Agent-E: Vision-Language Planning" — arxiv:2403.XXXXX
```

### Phase 6: Report

```
Research Gap Detection
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Graph: 372 nodes, 1,247 edges
Connected components: 9
Largest cluster: 124 papers ("Agentic Workflows")
Isolated papers: 3

Gap analysis: 12 cluster pairs
  HIGH priority: 4 (bridge candidates available)
  MEDIUM priority: 5
  LOW priority: 3

Bridge candidates found: 8 papers
Issues filed: 4
Papers recommended for induction: 8
```

## Distinction from research-gap

| Tool | Approach | Output |
|------|----------|--------|
| `research-gap` | **Intellectual** — topic coverage, missing areas, GRADE gaps | Gap report with search queries |
| `research-gap-detect` | **Structural** — citation graph topology, disconnected components | Cluster map, bridge candidates, filed issues |

`research-gap` answers "what topics are we missing?" while `research-gap-detect` answers "which existing papers don't cite each other but should?"

## Examples

```bash
# Full analysis with bridge search
/research-gap-detect --search-bridges

# Just show clusters
/research-gap-detect --clusters-only

# Detect and auto-file issues
/research-gap-detect --file-issues

# Combined: search + file
/research-gap-detect --search-bridges --file-issues

# JSON for visualization
/research-gap-detect --format json
```

## References

- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/corpus-index-build/SKILL.md — Builds the citation-network graph
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/citation-backfill/SKILL.md — Prerequisite: complete bidirectional edges
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-gap/SKILL.md — Complementary intellectual gap analysis
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/induct-research/SKILL.md — Inducts bridge candidates

Related Skills

research-workflow

from jmagly/aiwg

Execute multi-stage research workflows

research-status

from jmagly/aiwg

Show research corpus health and statistics

research-query

from jmagly/aiwg

Search the local research corpus, read matching findings, and synthesize an answer with inline citations to REF-XXX sources. The "query" operation for the research pipeline.

research-quality

from jmagly/aiwg

Assess source quality using GRADE methodology

research-quality-audit

from jmagly/aiwg

Audit research corpus for shallow stubs, incomplete sections, missing source files, and doc depth issues. Detects docs written from abstracts rather than full papers and optionally auto-dispatches expansion agents.

research-provenance

from jmagly/aiwg

Query provenance chains and artifact relationships

research-lint

from jmagly/aiwg

Run the research corpus lint ruleset to detect structural and referential integrity issues — orphan notes, missing frontmatter, broken references, missing GRADE assessments.

research-gap

from jmagly/aiwg

Analyze gaps in research coverage

research-document

from jmagly/aiwg

Generate summaries and literature notes from research papers

research-discover

from jmagly/aiwg

Search for research papers across academic databases

research-cite

from jmagly/aiwg

Generate properly formatted citation from research corpus

research-archive

from jmagly/aiwg

Package research artifacts for long-term archival