biorxiv-api

Preprint server API for biology and medicine papers

191 stars

bywentorai

View on GitHub Installation ↓

Best use case

biorxiv-api is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Preprint server API for biology and medicine papers

Teams using biorxiv-api should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/biorxiv-api/SKILL.md --create-dirs "https://raw.githubusercontent.com/wentorai/research-plugins/main/skills/literature/search/biorxiv-api/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/biorxiv-api/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How biorxiv-api Compares

Feature / Agent	biorxiv-api	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Preprint server API for biology and medicine papers

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# bioRxiv API Guide

## Overview

bioRxiv (pronounced "bio-archive") is a free online archive and distribution service for unpublished preprints in the life sciences. Operated by Cold Spring Harbor Laboratory, it provides researchers with immediate access to the latest findings before formal peer review. The bioRxiv API enables programmatic access to preprint metadata, content details, and publication linkage data across biology and medical sciences.

The API serves researchers who need to track emerging research trends, monitor preprint activity in specific subfields, or build automated literature surveillance pipelines. It is particularly valuable for systematic reviewers who want to capture the latest evidence before journal publication, and for bibliometric analysts studying the preprint-to-publication pipeline.

bioRxiv hosts preprints across more than 25 subject areas including neuroscience, genomics, bioinformatics, cell biology, and many more. The API returns structured metadata including titles, authors, abstracts, DOIs, publication dates, and links to corresponding published journal articles when available.

## Authentication

No authentication required. The bioRxiv API is fully open and does not require any API key, token, or registration. All endpoints are publicly accessible without rate limiting restrictions.

## Core Endpoints

### details: Retrieve Preprint Metadata

Fetch detailed metadata for preprints posted within a specified date range or for a specific server (bioRxiv or medRxiv).

- **URL**: `GET https://api.biorxiv.org/details/{server}/{interval}/{cursor}`
- **Parameters**:

| Parameter  | Type   | Required | Description                                      |
|------------|--------|----------|--------------------------------------------------|
| server     | string | Yes      | Server name: `biorxiv` or `medrxiv`              |
| interval   | string | Yes      | Date range in `YYYY-MM-DD/YYYY-MM-DD` format     |
| cursor     | int    | No       | Pagination cursor (default 0, increments of 100) |

- **Example**:

```bash
curl "https://api.biorxiv.org/details/biorxiv/2024-01-01/2024-01-31/0"
```

- **Response**: Returns a collection object containing `doi`, `title`, `authors`, `author_corresponding`, `date`, `category`, `abstract`, `published` (journal DOI if available), and `jatsxml` link.

### pubs: Published Article Linkage

Look up which preprints have been published in peer-reviewed journals, providing the mapping between preprint DOIs and journal article DOIs.

- **URL**: `GET https://api.biorxiv.org/pubs/{server}/{interval}/{cursor}`
- **Parameters**:

| Parameter  | Type   | Required | Description                                      |
|------------|--------|----------|--------------------------------------------------|
| server     | string | Yes      | Server name: `biorxiv` or `medrxiv`              |
| interval   | string | Yes      | Date range in `YYYY-MM-DD/YYYY-MM-DD` format     |
| cursor     | int    | No       | Pagination cursor (default 0, increments of 100) |

- **Example**:

```bash
curl "https://api.biorxiv.org/pubs/biorxiv/2024-01-01/2024-06-30/0"
```

- **Response**: Returns `preprint_doi`, `published_doi`, `preprint_title`, `published_journal`, `published_date`, and `preprint_date`.

## Rate Limits

No formal rate limits are documented for the bioRxiv API. However, responsible use is expected. Results are paginated at 100 records per request, and the cursor parameter should be incremented to retrieve additional pages. Avoid excessive concurrent requests to ensure availability for all users.

## Common Patterns

### Monitor New Preprints in a Subject Area

Retrieve the latest preprints and filter by category to track new submissions in your field:

```bash
# Fetch recent neuroscience preprints
curl "https://api.biorxiv.org/details/biorxiv/2024-06-01/2024-06-07/0" \
  | jq '.collection[] | select(.category == "neuroscience")'
```

### Track Preprint-to-Publication Conversion

Monitor which preprints in your area have been formally published:

```bash
# Check publication status for recent preprints
curl "https://api.biorxiv.org/pubs/biorxiv/2024-01-01/2024-06-30/0" \
  | jq '.collection[] | select(.published_doi != "")'
```

### Build a Preprint Alert System

Paginate through all results for a given date range to build a comprehensive alert feed:

```python
import requests

base = "https://api.biorxiv.org/details/biorxiv/2024-06-01/2024-06-07"
cursor = 0
all_preprints = []

while True:
    resp = requests.get(f"{base}/{cursor}").json()
    records = resp.get("collection", [])
    if not records:
        break
    all_preprints.extend(records)
    cursor += 100

print(f"Total preprints retrieved: {len(all_preprints)}")
```

## References

- Official documentation: https://api.biorxiv.org/
- bioRxiv homepage: https://www.biorxiv.org/
- medRxiv API (same structure): https://api.biorxiv.org/ (use `medrxiv` as server parameter)