arxiv-batch-reporting

Batch search and report generation from arXiv preprint repository

191 stars

Best use case

arxiv-batch-reporting is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Batch search and report generation from arXiv preprint repository

Teams using arxiv-batch-reporting should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/arxiv-batch-reporting/SKILL.md --create-dirs "https://raw.githubusercontent.com/wentorai/research-plugins/main/skills/literature/search/arxiv-batch-reporting/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/arxiv-batch-reporting/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How arxiv-batch-reporting Compares

Feature / Agent	arxiv-batch-reporting	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Batch search and report generation from arXiv preprint repository

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# arXiv Batch Reporting

## Overview

Keeping up with the flood of new preprints on arXiv is one of the most persistent challenges in fast-moving fields like machine learning, physics, mathematics, and computer science. The arXiv Batch Reporting skill provides a systematic approach to searching, filtering, and generating structured reports from arXiv at scale.

Unlike ad-hoc manual searches, this skill enables researchers to define persistent query profiles, run batch searches across date ranges, and produce formatted reports that highlight the most relevant papers. It is particularly useful for weekly or monthly literature surveillance, lab meeting preparation, and trend analysis across subfields.

The skill leverages the arXiv API and supports advanced query syntax, category filtering, and result ranking by relevance or recency. Reports can be generated in Markdown, HTML, or CSV formats for integration into existing workflows.

## Setting Up Batch Queries

### Query Profile Definition

Define your search profiles as structured configurations. Each profile specifies the search terms, category filters, date range, and output preferences:

```yaml
profile_name: "transformer-architectures-weekly"
queries:
  - "ti:transformer AND abs:attention mechanism"
  - "ti:vision transformer"
  - "abs:efficient transformer AND cat:cs.LG"
categories:
  - cs.LG
  - cs.CL
  - cs.CV
date_range: "last_7_days"
max_results: 100
sort_by: "submittedDate"
sort_order: "descending"
```

### arXiv API Query Syntax

The arXiv API supports field-specific searches:

- `ti:` — Search in title
- `abs:` — Search in abstract
- `au:` — Search by author
- `cat:` — Filter by category (e.g., `cs.AI`, `math.PR`, `physics.comp-ph`)
- Boolean operators: `AND`, `OR`, `ANDNOT`
- Group with parentheses for complex queries

**Example queries:**
- Find recent GAN papers in computer vision: `abs:generative adversarial AND cat:cs.CV`
- Find a specific author's work: `au:bengio AND ti:deep learning`
- Exclude survey papers: `abs:reinforcement learning ANDNOT ti:survey`

### Rate Limiting and Pagination

The arXiv API enforces rate limits. Follow these guidelines:

- Wait at least 3 seconds between API requests
- Use pagination with `start` and `max_results` parameters (max 2000 per request)
- For large batch jobs, implement exponential backoff on HTTP 503 responses
- Cache results locally to avoid redundant API calls

## Report Generation

### Standard Report Template

After collecting batch results, generate a report with the following structure:

```markdown
# arXiv Batch Report: [Profile Name]
**Date range:** [start] to [end]
**Total results:** [N] papers
**Generated:** [timestamp]

## Highlights (Top 10 by Relevance)
| # | Title | Authors | Category | Date |
|---|-------|---------|----------|------|
| 1 | [Title](arxiv-link) | First Author et al. | cs.LG | 2026-03-08 |

## Category Breakdown
- cs.LG: 45 papers
- cs.CL: 23 papers
- cs.CV: 18 papers

## Keyword Frequency
- "transformer": 38 mentions
- "attention": 29 mentions
- "efficient": 15 mentions

## Full Results
[Expandable table with all papers]
```

### Filtering and Ranking

After retrieving raw results, apply post-processing filters to surface the most relevant papers:

1. **Relevance scoring**: Score each paper based on keyword density in the title and abstract relative to your query terms.
2. **Author filtering**: Boost papers from authors on your watch list (key researchers in your field).
3. **Citation proxy**: Papers that appear in multiple query results likely sit at the intersection of your interests—rank them higher.
4. **Novelty detection**: Flag papers whose abstracts contain terms not seen in your previous reports, indicating potentially new directions.

## Automation and Scheduling

For ongoing literature surveillance, automate your batch reports:

- **Cron scheduling**: Run batch queries weekly (e.g., every Monday at 8 AM) using a scheduled task or CI pipeline.
- **Diff reports**: Compare the current week's results against the previous week to highlight only new papers.
- **Alert thresholds**: Set alerts when a report contains more than N papers matching a high-priority query, indicating a burst of activity in that area.
- **Email or Slack delivery**: Route generated reports to your inbox or lab Slack channel for team-wide awareness.

Store all generated reports in a versioned directory structure for longitudinal trend analysis:

```
reports/
  transformer-architectures-weekly/
    2026-03-03.md
    2026-03-10.md
    ...
```

## References

- arXiv API documentation: https://info.arxiv.org/help/api/index.html
- arXiv category taxonomy: https://arxiv.org/category_taxonomy
- arXiv Batch Search: wentor-research-plugins