emerging-topic-scout

Monitor bioRxiv/medRxiv preprints and academic discussions to identify emerging research hotspots before they appear in mainstream journals

3,891 stars

Best use case

emerging-topic-scout is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Monitor bioRxiv/medRxiv preprints and academic discussions to identify emerging research hotspots before they appear in mainstream journals

Teams using emerging-topic-scout should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/emerging-topic-scout/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/aipoch-ai/emerging-topic-scout/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/emerging-topic-scout/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How emerging-topic-scout Compares

Feature / Agentemerging-topic-scoutStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Monitor bioRxiv/medRxiv preprints and academic discussions to identify emerging research hotspots before they appear in mainstream journals

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Emerging Topic Scout

A real-time monitoring system for identifying "incubation period" research hotspots in biological and medical sciences before they are defined by mainstream journals.

## Overview

This skill continuously monitors:
- **bioRxiv**: Biology preprints via RSS/API ⚠️ *Currently blocked by Cloudflare*
- **medRxiv**: Medicine preprints via RSS/API ⚠️ *Currently blocked by Cloudflare*
- **arXiv**: Quantitative Biology preprints via RSS ✅ *Recommended alternative*
- **Academic discussions**: Social media and forum mentions

It uses trend analysis algorithms to detect sudden spikes in topic frequency, cross-platform mentions, and emerging keyword clusters.

### ⚠️ Network Access Notice

**bioRxiv and medRxiv** are currently protected by Cloudflare JavaScript Challenge, which prevents programmatic RSS access. As a workaround, this skill now supports **arXiv q-bio** (Quantitative Biology) as an alternative data source.

**Recommended usage:**
```bash
# Use arXiv for reliable data fetching
python scripts/main.py --sources arxiv --days 30

# bioRxiv/medRxiv may return 0 results due to Cloudflare protection
python scripts/main.py --sources biorxiv medrxiv --days 30  # May not work
```

## Installation

```bash
cd /Users/z04030865/.openclaw/workspace/skills/emerging-topic-scout
pip install -r scripts/requirements.txt
```

## Usage

### Basic Scan (Recommended: Use arXiv)

```bash
python scripts/main.py --sources arxiv --days 7 --output json
```

### Legacy bioRxiv/medRxiv (May not work due to Cloudflare)

```bash
python scripts/main.py --sources biorxiv medrxiv --days 7 --output json
```

### Advanced Configuration (arXiv Recommended)

```bash
python scripts/main.py \
  --sources arxiv \
  --keywords "CRISPR,gene editing,machine learning" \
  --days 14 \
  --min-score 0.7 \
  --output markdown \
  --notify
```

### Legacy Configuration (bioRxiv/medRxiv - May not work)

```bash
python scripts/main.py \
  --sources biorxiv medrxiv \
  --keywords "CRISPR,gene editing,long COVID" \
  --days 14 \
  --min-score 0.7 \
  --output markdown \
  --notify
# Note: bioRxiv/medRxiv may return 0 results due to Cloudflare protection

## Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `--sources` | list | `arxiv` | Data sources to monitor (arxiv recommended due to Cloudflare issues with biorxiv/medrxiv) |
| `--keywords` | string | (auto-detect) | Comma-separated keywords to track |
| `--days` | int | `7` | Lookback period in days |
| `--min-score` | float | `0.6` | Minimum trending score (0-1) |
| `--max-topics` | int | `20` | Maximum topics to return |
| `--output` | string | `markdown` | Output format: `json`, `markdown`, `csv` |
| `--notify` | flag | `false` | Send notification for high-priority topics |
| `--config` | path | `config.yaml` | Path to configuration file |

## Output Format

### JSON Output

```json
{
  "scan_date": "2026-02-06T05:57:00Z",
  "sources": ["biorxiv", "medrxiv"],
  "hot_topics": [
    {
      "topic": "gene editing therapy",
      "keywords": ["CRISPR", "base editing", "prime editing"],
      "trending_score": 0.89,
      "velocity": "rapid",
      "preprint_count": 34,
      "cross_platform_mentions": 127,
      "related_papers": [
        {
          "title": "New CRISPR variant shows promise",
          "authors": ["Smith J.", "Lee K."],
          "doi": "10.1101/2026.01.15.xxxxx",
          "source": "biorxiv",
          "published": "2026-01-15",
          "abstract_summary": "..."
        }
      ],
      "emerging_since": "2026-01-20"
    }
  ],
  "summary": {
    "total_papers_analyzed": 1247,
    "new_topics_detected": 8,
    "high_priority_alerts": 2
  }
}
```

### Markdown Output

```markdown
# Emerging Topics Report - 2026-02-06

## 🔥 High Priority Topics

### 1. Gene Editing Therapy (Score: 0.89)
- **Keywords**: CRISPR, base editing, prime editing
- **Growth Rate**: Rapid (+145% vs last week)
- **Preprints**: 34 papers
- **Cross-platform mentions**: 127

#### Key Papers
1. "New CRISPR variant shows promise" - Smith J. et al.
   - DOI: 10.1101/2026.01.15.xxxxx
   - Source: bioRxiv
```

## Configuration File

Create `config.yaml` for persistent settings:

```yaml
sources:
  arxiv:
    enabled: true
    rss_url: "https://export.arxiv.org/rss/q-bio"
    description: "arXiv Quantitative Biology - Recommended (no Cloudflare)"
  biorxiv:
    enabled: false  # Disabled due to Cloudflare protection
    rss_url: "https://www.biorxiv.org/rss/recent.rss"
    api_endpoint: "https://api.biorxiv.org/details/"
    note: "Currently blocked by Cloudflare JavaScript Challenge"
  medrxiv:
    enabled: false  # Disabled due to Cloudflare protection
    rss_url: "https://www.medrxiv.org/rss/recent.rss"
    api_endpoint: "https://api.medrxiv.org/details/"
    note: "Currently blocked by Cloudflare JavaScript Challenge"

trending:
  min_papers_threshold: 5
  velocity_window_days: 3
  novelty_weight: 0.4
  momentum_weight: 0.6

keywords:
  auto_detect: true
  custom_trackers:
    - "artificial intelligence"
    - "machine learning"
    - "single cell"
    - "spatial transcriptomics"

output:
  default_format: markdown
  save_history: true
  history_path: "./data/history.json"

notifications:
  enabled: false
  high_score_threshold: 0.8
```

## Trending Score Algorithm

The trending score (0-1) is calculated using:

```
Score = (Novelty × 0.4) + (Momentum × 0.4) + (CrossRef × 0.2)

Where:
- Novelty: Inverse frequency of topic in historical data
- Momentum: Rate of increase in mentions over velocity window
- CrossRef: Mentions across multiple platforms
```

## API Endpoints

### bioRxiv API
- Base: `https://api.biorxiv.org/`
- Details: `/details/[server]/[DOI]/[format]`
- Publication: `/pub/[DOI]/[format]`

### medRxiv API
- Same structure as bioRxiv

## Data Storage

Historical data is stored in `data/history.json` for:
- Trend comparison
- Velocity calculation
- Duplicate detection

## Examples

### Example 1: Quick Daily Scan (arXiv - Recommended)

```bash
python scripts/main.py --sources arxiv --days 1 --output markdown
```

### Example 2: Daily Scan with bioRxiv (May not work)

```bash
python scripts/main.py --sources biorxiv --days 1 --output markdown
# Note: May return 0 results due to Cloudflare protection

### Example 2: Weekly Deep Analysis

```bash
python scripts/main.py \
  --days 7 \
  --min-score 0.7 \
  --max-topics 50 \
  --output json \
  > weekly_report.json
```

### Example 3: Track Specific Research Area

```bash
python scripts/main.py \
  --keywords "Alzheimer,neurodegeneration,amyloid" \
  --days 30 \
  --min-score 0.5
```

## Known Issues

### bioRxiv/medRxiv Cloudflare Protection
**Status:** ❌ Blocked  
**Issue:** bioRxiv and medRxiv RSS feeds are protected by Cloudflare JavaScript Challenge, which prevents programmatic access. The site returns an HTML page requiring JavaScript execution and cookie validation.

**Attempted Solutions:**
1. ✅ Added browser User-Agent headers → **Failed** (Cloudflare detects bot)
2. ✅ Added complete browser headers (Accept, Accept-Language, etc.) → **Failed** 
3. ❌ Browser automation (Selenium/Playwright) → **Not implemented** (complex, heavy dependency)

**Workaround:** ✅ **Use arXiv instead**
- arXiv q-bio (Quantitative Biology) RSS is accessible without protection
- Contains computational biology, bioinformatics, and quantitative biology papers
- Successfully tested: 35+ papers fetched in 30-day window

**Usage:**
```bash
# Recommended: Use arXiv
python scripts/main.py --sources arxiv --days 30

# Not working: bioRxiv/medRxiv
python scripts/main.py --sources biorxiv medrxiv --days 30  # Returns 0 papers
```

## Troubleshooting

### Rate Limiting
If you encounter rate limits, increase the `--delay` parameter (default: 1s between requests).

### Missing Papers (0 results from bioRxiv/medRxiv)
This is expected due to Cloudflare protection. **Use `--sources arxiv` instead.**

### RSS Feed Access Denied
Some institutional firewalls may block preprint servers. Ensure you can access:
- ✅ `https://export.arxiv.org/rss/q-bio` (should work)
- ❌ `https://www.biorxiv.org/rss/recent.rss` (Cloudflare blocked)

### Low Trending Scores
For niche topics, lower `--min-score` threshold or increase `--days` for more data.

## References

See `references/README.md` for:
- API documentation links
- Research papers on trend detection
- Related tools and resources

## License

MIT License - Part of OpenClaw Skills Collection

## Risk Assessment

| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python scripts with tools | High |
| Network Access | External API calls | High |
| File System Access | Read/write data | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Data handled securely | Medium |

## Security Checklist

- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] API requests use HTTPS only
- [ ] Input validated against allowed patterns
- [ ] API timeout and retry mechanisms implemented
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no internal paths exposed)
- [ ] Dependencies audited
- [ ] No exposure of internal service architecture
## Prerequisites

```bash
# Python dependencies
pip install -r requirements.txt
```

## Evaluation Criteria

### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable

### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time

## Lifecycle Status

- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**:
  - ⚠️ **bioRxiv/medRxiv blocked by Cloudflare** (use arXiv as workaround)
  - Network access limitations for some RSS feeds
- **Planned Improvements**: 
  - Investigate bioRxiv/medRxiv API alternatives
  - Consider browser automation for Cloudflare bypass
  - Add more arXiv categories (q-bio subcategories)
  - Performance optimization

Related Skills

talent-scout

3891
from openclaw/skills

Steal your competitors' best people — scrape LinkedIn, AI-rank candidates, and generate personalized outreach DMs in one command

opportunity-scout

3891
from openclaw/skills

Find profitable business opportunities in any niche by scanning Twitter, web, Reddit, and Product Hunt for unmet needs and pain points. Scores each opportunity on Demand, Competition, Feasibility, and Monetization (1-5 each, max 20). Generates a ranked report with actionable recommendations. Use when asked to find business ideas, market gaps, product opportunities, or "what should I build" questions. Also triggers on: market research, niche analysis, opportunity hunting, trend scouting, competitive analysis for new products.

agentscout

3891
from openclaw/skills

Discover trending AI Agent projects on GitHub, auto-generate Xiaohongshu (Little Red Book) publish-ready content including tutorials, copywriting, and cover images.

aibrary-foryou-topic

3891
from openclaw/skills

[Aibrary] Generate personalized 'For You' book topic recommendations based on the user's profile, interests, career stage, and recent learning activity. Use when the user wants personalized topic suggestions, asks 'what should I learn today', wants a curated feed of book-based topics, or needs inspiration for their next area of exploration. Proactively suggest this when the user seems undecided about what to read or learn next.

blockscout-analysis

3891
from openclaw/skills

MANDATORY — invoke this skill BEFORE making any Blockscout MCP tool calls or writing any blockchain data scripts, even when the Blockscout MCP server is already configured. Provides architectural rules, execution-strategy decisions, MCP REST API conventions for scripts, endpoint reference files, response transformation requirements, and output conventions that are not available from MCP tool descriptions alone. Use when the user asks about on-chain data, blockchain analysis, wallet balances, token transfers, contract interactions, on-chain metrics, wants to use the Blockscout API, or needs to build software that retrieves blockchain data via Blockscout. Covers all EVM chains.

network-hot-topics

3891
from openclaw/skills

获取当前网络热点并汇总为 10 条摘要。从微博、知乎、百度等平台或通过搜索获取实时热搜/热榜, 筛选、去重后输出 10 条热点,每条包含标题与一句话摘要。 Use when: 用户需要今日热点、热搜汇总、热榜简报、网络热点 10 条、多平台热点摘要。 NOT for: 单平台单一话题深挖、历史热点分析、需要原文链接列表的场景请说明。

web-scout

3891
from openclaw/skills

给 AI Agent 一键装上全网采集能力。基于 Agent Reach,支持 Twitter/X、Reddit、YouTube、B站、小红书、抖音、GitHub、LinkedIn、Boss直聘、RSS、全网搜索等平台。一条命令安装,零 API 费用。

outreach-scout

3891
from openclaw/skills

Find and engage warm leads on Reddit, X/Twitter, and forums. Monitors platforms for people asking questions your product solves, drafts helpful replies that naturally mention your offering, and tracks all activity. Use when you need marketing, lead generation, audience building, finding potential customers, or growing product awareness. Works with heartbeats for automated daily scouting.

grant-funding-scout

3891
from openclaw/skills

NIH funding trend analysis to identify high-priority research areas

ai-topic-scout

3891
from openclaw/skills

AI短视频选题追踪系统。自动抓取指定YouTube博主视频和Twitter博主推文,分析内容,聚合跨平台热点主题,生成带热度评分和选题建议的分析报告,结果写入钉钉AI表格。适用于:定时抓取AI领域博主内容、分析短视频选题热度、跨平台话题聚合、生成选题建议。触发词:"抓取选题"、"分析选题"、"选题scout"、"topic scout"、"抓取博主内容"、"选题分析"。

---

3891
from openclaw/skills

name: article-factory-wechat

Content & Documentation

humanizer

3891
from openclaw/skills

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

Content & Documentation