Codex

research-gap-detect

Build the mutual citation graph, find connected components, identify isolated clusters, and optionally search for bridge candidates and file gap issues. Automates the manual cluster analysis workflow.

104 stars

Best use case

research-gap-detect is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Build the mutual citation graph, find connected components, identify isolated clusters, and optionally search for bridge candidates and file gap issues. Automates the manual cluster analysis workflow.

Teams using research-gap-detect should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/research-gap-detect/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/research-gap-detect/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/research-gap-detect/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How research-gap-detect Compares

Feature / Agentresearch-gap-detectStandard Approach
Platform SupportCodexLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Build the mutual citation graph, find connected components, identify isolated clusters, and optionally search for bridge candidates and file gap issues. Automates the manual cluster analysis workflow.

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Research Gap Detect

Analyze the research corpus citation graph to find disconnected clusters, isolated papers, and gap opportunities. Optionally searches for bridge paper candidates and files gap issues.

## Triggers

- "find research gaps"
- "detect clusters"
- "cluster analysis"
- "find isolated papers"
- "bridge candidate search"
- `/research-gap-detect`

## Parameters

### `--clusters-only` (optional)
Only run cluster detection — skip bridge search and issue filing.

### `--file-issues` (optional)
Auto-file gap issues for each disconnected cluster pair.

### `--search-bridges` (optional)
Search external databases for papers that could bridge disconnected clusters.

### `--min-cluster-size N` (optional)
Minimum papers in a cluster to report. Default: 2.

### `--format` (optional)
Output format: `full` (default), `summary`, or `json`.

## Execution Flow

### Phase 1: Build Citation Graph

1. Read the citation-network index (from `/corpus-index-build --graph citation-network`)
   - If stale or missing: run `/corpus-index-build --graph citation-network` first
2. Build an adjacency list from outgoing + incoming edges
3. Treat as undirected for cluster detection (A cites B ≡ A connected to B)

### Phase 2: Connected Components (BFS)

Run BFS/connected-components on the undirected citation graph:

1. Initialize: all nodes unvisited
2. For each unvisited node: BFS to find its connected component
3. Collect components sorted by size (largest first)

**Output**:
```
Connected Components: 9

Cluster 1: "Agentic Workflows" (124 papers)
  Hub: REF-016 (34 connections)
  Topics: agentic-workflows, multi-agent, orchestration
  Sample: REF-001, REF-016, REF-024, REF-121 ...

Cluster 2: "GUI Agents" (31 papers)
  Hub: REF-198 (12 connections)
  Topics: gui-agents, web-agents, screen-understanding
  Sample: REF-198, REF-201, REF-215 ...

...

Cluster 9: "Isolated" (3 papers)
  No hub (all degree 1)
  REF-299, REF-312, REF-350
```

### Phase 3: Gap Analysis

For each pair of clusters, assess the gap:

1. **Topic overlap** — do the clusters share any tags?
2. **Temporal overlap** — do they cover the same years?
3. **Author overlap** — do any authors appear in both clusters?
4. **Bridgeability** — could a single paper connect them?

Prioritize gaps by:
- **Size product** — larger clusters disconnected = higher priority
- **Topic proximity** — clusters with related but not identical topics
- **Recency** — newer clusters may simply be missing recent cross-citations

**Output**:
```
Gap Analysis: 12 cluster pairs

Priority 1: "Agentic Workflows" ↔ "GUI Agents"
  Gap: 124 × 31 = 3,844 (size product)
  Topic overlap: agent, llm (2 shared tags)
  Bridge opportunity: HIGH
  Suggested search: "LLM agent GUI interaction orchestration"

Priority 2: "Evaluation" ↔ "Reproducibility"
  Gap: 45 × 28 = 1,260
  Topic overlap: evaluation, benchmark (2 shared tags)
  Bridge opportunity: MEDIUM
  Suggested search: "reproducible LLM evaluation benchmarks"
...
```

### Phase 4: Bridge Search (if --search-bridges)

For each high-priority gap:

1. Generate search queries from cluster topic overlap
2. Search external databases (Semantic Scholar, arXiv, Google Scholar)
3. Filter candidates by:
   - Cites papers from BOTH clusters
   - Published in overlapping time range
   - High citation count (likely to be connecting work)
4. Rank candidates by bridge potential

**Output**:
```
Bridge Candidates Found: 8

For gap "Agentic Workflows" ↔ "GUI Agents":
  1. "WebAgent: World-Centric Web Navigation" (2024)
     Cites: REF-016 (Cluster 1), REF-198 (Cluster 2)
     Citations: 87
     Bridge potential: HIGH

  2. "Agent-E: Vision-Language Planning for Web Tasks" (2024)
     Cites: REF-024 (Cluster 1), REF-201 (Cluster 2)
     Citations: 45
     Bridge potential: MEDIUM
```

### Phase 5: File Issues (if --file-issues)

For each gap with bridge candidates, file a research induction issue:

```markdown
## Research Gap: [Cluster A] ↔ [Cluster B]

**Gap Size**: [N × M papers disconnected]
**Bridge Candidates**: [list]
**Suggested Action**: Induct [top candidate] to connect clusters

### Bridge Papers to Induct
- [ ] "WebAgent: World-Centric Web Navigation" — arxiv:2401.XXXXX
- [ ] "Agent-E: Vision-Language Planning" — arxiv:2403.XXXXX
```

### Phase 6: Report

```
Research Gap Detection
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Graph: 372 nodes, 1,247 edges
Connected components: 9
Largest cluster: 124 papers ("Agentic Workflows")
Isolated papers: 3

Gap analysis: 12 cluster pairs
  HIGH priority: 4 (bridge candidates available)
  MEDIUM priority: 5
  LOW priority: 3

Bridge candidates found: 8 papers
Issues filed: 4
Papers recommended for induction: 8
```

## Distinction from research-gap

| Tool | Approach | Output |
|------|----------|--------|
| `research-gap` | **Intellectual** — topic coverage, missing areas, GRADE gaps | Gap report with search queries |
| `research-gap-detect` | **Structural** — citation graph topology, disconnected components | Cluster map, bridge candidates, filed issues |

`research-gap` answers "what topics are we missing?" while `research-gap-detect` answers "which existing papers don't cite each other but should?"

## Examples

```bash
# Full analysis with bridge search
/research-gap-detect --search-bridges

# Just show clusters
/research-gap-detect --clusters-only

# Detect and auto-file issues
/research-gap-detect --file-issues

# Combined: search + file
/research-gap-detect --search-bridges --file-issues

# JSON for visualization
/research-gap-detect --format json
```

## References

- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/corpus-index-build/SKILL.md — Builds the citation-network graph
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/citation-backfill/SKILL.md — Prerequisite: complete bidirectional edges
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-gap/SKILL.md — Complementary intellectual gap analysis
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/induct-research/SKILL.md — Inducts bridge candidates