open-researcher-guide

Open pipeline for generating deep research trajectories with LLMs

191 stars

Best use case

open-researcher-guide is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Open pipeline for generating deep research trajectories with LLMs

Teams using open-researcher-guide should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/open-researcher-guide/SKILL.md --create-dirs "https://raw.githubusercontent.com/wentorai/research-plugins/main/skills/research/deep-research/open-researcher-guide/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/open-researcher-guide/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How open-researcher-guide Compares

Feature / Agentopen-researcher-guideStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Open pipeline for generating deep research trajectories with LLMs

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# OpenResearcher Guide

## Overview

OpenResearcher is a fully open pipeline for long-horizon deep research trajectory synthesis. It breaks complex research questions into sub-questions, iteratively searches and reads literature, builds internal knowledge representations, and synthesizes comprehensive answers. Unlike single-shot approaches, it models the researcher's thought process — reading, questioning, connecting, and refining understanding over multiple rounds.

## Pipeline Stages

### 1. Question Decomposition

```python
from open_researcher import OpenResearcher

researcher = OpenResearcher(llm_provider="anthropic")

# Complex research question
result = researcher.research(
    "How do retrieval-augmented generation systems handle "
    "knowledge conflicts between parametric and retrieved knowledge, "
    "and what are the current mitigation strategies?"
)

# Automatically decomposes into sub-questions:
# SQ1: What types of knowledge conflicts occur in RAG?
# SQ2: How are conflicts detected?
# SQ3: What resolution strategies exist?
# SQ4: How effective are these strategies?
```

### 2. Iterative Search and Reading

```python
# Each sub-question triggers:
# - Academic search (OpenAlex, arXiv)
# - Paper reading (abstract + key sections)
# - Evidence extraction
# - Follow-up question generation

# Configuration
researcher = OpenResearcher(
    search_backends=["openalex", "arxiv"],
    max_iterations=5,           # Research rounds per sub-question
    papers_per_iteration=10,    # Papers to read per round
    follow_up_questions=True,   # Generate follow-up questions
)
```

### 3. Knowledge Graph Building

```python
# Internally builds a knowledge representation:
# - Claims linked to source papers
# - Relationships between concepts
# - Contradictions flagged

# Access the knowledge graph
kg = result.knowledge_graph
print(f"Concepts: {len(kg.nodes)}")
print(f"Relations: {len(kg.edges)}")
print(f"Contradictions: {len(kg.contradictions)}")
```

### 4. Synthesis and Report

```python
# Multi-section synthesis
report = result.report

# Sections:
# 1. Introduction and scope
# 2. Sub-question answers with evidence
# 3. Cross-cutting themes
# 4. Open questions and future directions
# 5. Full bibliography

report.save("research_report.md")
report.export_bibliography("refs.bib")
```

## Configuration

```python
researcher = OpenResearcher(
    llm_provider="anthropic",
    model="claude-sonnet-4-20250514",
    search_config={
        "backends": ["openalex", "arxiv"],
        "max_results_per_query": 20,
    },
    reading_config={
        "sections": ["abstract", "introduction", "methods", "conclusion"],
        "max_tokens_per_paper": 3000,
    },
    synthesis_config={
        "style": "academic",           # academic, technical, accessible
        "include_contradictions": True,
        "cite_inline": True,
    },
)
```

## Trajectory Inspection

```python
# Inspect the research trajectory
trajectory = result.trajectory

for step in trajectory:
    print(f"Round {step.round}: {step.action}")
    print(f"  Query: {step.query}")
    print(f"  Papers read: {step.papers_read}")
    print(f"  Key findings: {step.findings[:100]}...")
    print(f"  Follow-ups: {step.follow_up_questions}")
```

## Use Cases

1. **Literature surveys**: Comprehensive multi-round research
2. **Research proposals**: Evidence gathering for grant applications
3. **State-of-the-art reports**: Current landscape analysis
4. **Tutorial generation**: Deep topic explanations with citations

## References

- [OpenResearcher GitHub](https://github.com/GAIR-NLP/OpenResearcher)
- [GAIR-NLP Lab](https://github.com/GAIR-NLP)