paper2code-arxiv-implementation

Agent skill to convert any arxiv paper into a citation-anchored, working Python implementation with ambiguity auditing

22 stars

Best use case

paper2code-arxiv-implementation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Agent skill to convert any arxiv paper into a citation-anchored, working Python implementation with ambiguity auditing

Teams using paper2code-arxiv-implementation should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/paper2code-arxiv-implementation/SKILL.md --create-dirs "https://raw.githubusercontent.com/Aradotso/trending-skills/main/skills/paper2code-arxiv-implementation/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/paper2code-arxiv-implementation/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How paper2code-arxiv-implementation Compares

Feature / Agent	paper2code-arxiv-implementation	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Agent skill to convert any arxiv paper into a citation-anchored, working Python implementation with ambiguity auditing

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# paper2code — Arxiv Paper to Working Implementation

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.

paper2code is a Claude Code agent skill that converts any arxiv paper URL into a citation-anchored Python implementation. Every code decision references the exact paper section and equation it implements, and all gaps/ambiguities are explicitly flagged rather than silently filled in.

---

## Install

```bash
npx skills add PrathamLearnsToCode/paper2code/skills/paper2code
```

During install you'll choose:
- **Agents**: which coding agents get the skill (e.g., Claude Code)
- **Scope**: Global (recommended) or project-level
- **Method**: Symlink (recommended) or copy

Then launch your agent:

```bash
claude
```

---

## Core Commands

### Basic usage

```
/paper2code https://arxiv.org/abs/1706.03762
```

### With framework override

```
/paper2code https://arxiv.org/abs/2006.11239 --framework jax
/paper2code https://arxiv.org/abs/2006.11239 --framework pytorch   # default
/paper2code https://arxiv.org/abs/2006.11239 --framework tensorflow
```

### With mode flag

```
/paper2code 1706.03762 --mode minimal       # architecture only (default)
/paper2code 1706.03762 --mode full          # includes training loop + data pipeline
/paper2code 1706.03762 --mode educational   # extra comments + pedagogical notebook
```

### Bare arxiv ID (no URL required)

```
/paper2code 1706.03762
/paper2code 2106.09685
```

---

## Output Structure

Every run produces a directory named after the paper slug:

```
attention_is_all_you_need/
├── README.md                  # Paper summary + quick-start
├── REPRODUCTION_NOTES.md      # Ambiguity audit, unspecified choices, known deviations
├── requirements.txt           # Pinned dependencies
├── src/
│   ├── model.py               # Architecture — every layer cited to paper section
│   ├── loss.py                # Loss functions with equation references
│   ├── data.py                # Dataset skeleton with preprocessing TODOs
│   ├── train.py               # Training loop (full/educational mode)
│   ├── evaluate.py            # Metric computation
│   └── utils.py               # Shared utilities
├── configs/
│   └── base.yaml              # All hyperparams — each cited or flagged [UNSPECIFIED]
└── notebooks/
    └── walkthrough.ipynb      # Paper section → code → shape checks
```

---

## Citation Anchoring Convention

The core value of paper2code is traceability. Every non-trivial decision is tagged:

| Tag | Meaning |
|-----|---------|
| `§X.Y` | Directly specified in section X.Y |
| `§X.Y, Eq. N` | Implements equation N from section X.Y |
| `[UNSPECIFIED]` | Paper doesn't state this — choice made with alternatives listed |
| `[PARTIALLY_SPECIFIED]` | Paper mentions it but is ambiguous — quote included |
| `[ASSUMPTION]` | Reasonable inference — reasoning explained |
| `[FROM_OFFICIAL_CODE]` | Taken from authors' official implementation |

### Example — model.py with citation anchors

```python
import torch
import torch.nn as nn
import math


class MultiHeadAttention(nn.Module):
    """§3.2 — Multi-Head Attention
    
    Implements Eq. 4: MultiHead(Q, K, V) = Concat(head_1, ..., head_h) W^O
    where head_i = Attention(Q W_i^Q, K W_i^K, V W_i^V)
    """

    def __init__(self, d_model: int, num_heads: int, dropout: float = 0.1):
        super().__init__()
        # §3.2 — d_model = 512, h = 8 stated in Table 1
        assert d_model % num_heads == 0
        self.d_k = d_model // num_heads  # §3.2 — d_k = d_v = d_model / h = 64
        self.num_heads = num_heads

        self.W_q = nn.Linear(d_model, d_model)
        self.W_k = nn.Linear(d_model, d_model)
        self.W_v = nn.Linear(d_model, d_model)
        self.W_o = nn.Linear(d_model, d_model)  # §3.2, Eq. 4 — W^O projection

        # [UNSPECIFIED] Dropout rate for attention weights not stated in §3.2
        # Using 0.1 matching the model-wide dropout (§5.4, Table 3)
        self.dropout = nn.Dropout(dropout)

    def forward(self, q, k, v, mask=None):
        batch_size = q.size(0)

        # §3.2, Eq. 4 — project into h heads
        Q = self.W_q(q).view(batch_size, -1, self.num_heads, self.d_k).transpose(1, 2)
        K = self.W_k(k).view(batch_size, -1, self.num_heads, self.d_k).transpose(1, 2)
        V = self.W_v(v).view(batch_size, -1, self.num_heads, self.d_k).transpose(1, 2)

        # §3.2.1, Eq. 1 — Attention(Q,K,V) = softmax(QK^T / sqrt(d_k)) V
        scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.d_k)

        if mask is not None:
            # §3.2.3 — decoder masks future positions with -inf before softmax
            scores = scores.masked_fill(mask == 0, float('-inf'))

        attn_weights = torch.softmax(scores, dim=-1)
        attn_weights = self.dropout(attn_weights)

        out = torch.matmul(attn_weights, V)  # (batch, heads, seq, d_k)
        out = out.transpose(1, 2).contiguous().view(batch_size, -1, self.num_heads * self.d_k)
        return self.W_o(out)  # §3.2, Eq. 4 — W^O output projection


class TransformerBlock(nn.Module):
    """§3.1 — Encoder/Decoder layer structure"""

    def __init__(self, d_model: int, num_heads: int, d_ff: int, dropout: float = 0.1):
        super().__init__()
        self.attention = MultiHeadAttention(d_model, num_heads, dropout)

        # [ASSUMPTION] Using pre-norm based on stability; paper Figure 1 shows post-norm
        # Post-norm: x = LayerNorm(x + sublayer(x)) — §3.1
        # [PARTIALLY_SPECIFIED] "We apply layer normalization" — position ambiguous
        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)

        # §3.3 — FFN(x) = max(0, xW_1 + b_1)W_2 + b_2
        self.ff = nn.Sequential(
            nn.Linear(d_model, d_ff),
            nn.ReLU(),  # §3.3 — "ReLU activation"
            nn.Dropout(dropout),
            nn.Linear(d_ff, d_model),
        )
        self.dropout = nn.Dropout(dropout)

    def forward(self, x, mask=None):
        # §3.1 — residual connection around each sub-layer
        attn_out = self.attention(self.norm1(x), self.norm1(x), self.norm1(x), mask)
        x = x + self.dropout(attn_out)
        x = x + self.dropout(self.ff(self.norm2(x)))
        return x
```

### Example — configs/base.yaml with citations

```yaml
# base.yaml — All hyperparameters for attention_is_all_you_need
# Each value is either cited from the paper or flagged [UNSPECIFIED]

model:
  d_model: 512          # §3, Table 1 — "d_model = 512"
  num_heads: 8          # §3.2, Table 1 — "h = 8"
  d_ff: 2048            # §3.3, Table 1 — "d_ff = 2048"
  num_encoder_layers: 6 # §3, Table 1 — "N = 6"
  num_decoder_layers: 6 # §3, Table 1 — "N = 6"
  dropout: 0.1          # §5.4, Table 3 — "P_drop = 0.1"
  max_seq_len: 512      # [UNSPECIFIED] not stated; using 512 (common default)
                        # Alternatives: 256, 1024

training:
  batch_size: 25000     # §5.1 — "each batch ~25,000 source + target tokens"
  optimizer: adam       # §5.3 — "Adam optimizer"
  beta1: 0.9            # §5.3 — "β1 = 0.9"
  beta2: 0.98           # §5.3 — "β2 = 0.98"
  epsilon: 1.0e-9       # §5.3 — "ε = 10^-9"
  warmup_steps: 4000    # §5.3 — "warmup_steps = 4000"
  label_smoothing: 0.1  # §5.4 — "ε_ls = 0.1"
```

### Example — REPRODUCTION_NOTES.md structure

```markdown
# Reproduction Notes — Attention Is All You Need

## Ambiguity Audit

### SPECIFIED (high confidence)
| Choice | Value | Source |
|--------|-------|--------|
| d_model | 512 | §3, Table 1 |
| num_heads | 8 | §3.2, Table 1 |
| optimizer | Adam β1=0.9, β2=0.98 | §5.3 |

### PARTIALLY_SPECIFIED (judgment call made)
| Choice | Our Decision | Paper Quote | Alternatives |
|--------|-------------|-------------|--------------|
| Norm position | pre-norm | "layer norm before each sub-layer" (§3.1) conflicts with Figure 1 | post-norm |

### UNSPECIFIED (our defaults)
| Choice | Our Default | Rationale | Alternatives |
|--------|-------------|-----------|--------------|
| LayerNorm epsilon | 1e-6 | common default | 1e-5, 1e-8 |
| max_seq_len | 512 | common for WMT | 256, 1024 |

## Known Deviations
- data.py provides skeleton only; WMT14 preprocessing not implemented
- No beam search decoding (§5 mentions beam size 4, not fully implemented)
```

---

## What paper2code Will NOT Do

Understanding limits prevents wasted debugging time:

- **Won't guarantee correctness** — matches what the paper describes; if the paper is wrong, the code is wrong
- **Won't invent details silently** — gaps are always `[UNSPECIFIED]`, never filled confidently
- **Won't download datasets** — `data.py` gives a `Dataset` skeleton with instructions
- **Won't set up training infrastructure** — no distributed training, no experiment tracking
- **Won't implement baselines** — only the paper's core contribution
- **Won't reimplement standard components** — imports them or notes the dependency

---

## Common Patterns

### Pattern 1 — Implement a new architecture paper

```
/paper2code https://arxiv.org/abs/2010.11929 --mode minimal
```

Focus: `src/model.py` will contain the full architecture. Review `REPRODUCTION_NOTES.md` to understand every ambiguous choice before running.

### Pattern 2 — Reproduce a training method

```
/paper2code https://arxiv.org/abs/2006.11239 --mode full --framework pytorch
```

Focus: `src/train.py` will contain the full training loop. `configs/base.yaml` will list every hyperparameter with paper citations.

### Pattern 3 — Educational deep-dive

```
/paper2code 1706.03762 --mode educational
```

Focus: `notebooks/walkthrough.ipynb` walks through each paper section, shows corresponding code, and runs CPU-safe shape checks.

### Pattern 4 — Quick architecture prototype

```
/paper2code 2106.09685  # ViT
```

Then inspect and run:

```bash
cd vision_transformer/
pip install -r requirements.txt
python -c "
from src.model import VisionTransformer
import torch
model = VisionTransformer()  # toy config
x = torch.randn(2, 3, 224, 224)
print(model(x).shape)
"
```

---

## Troubleshooting

### Skill not triggering
- Confirm install completed: `npx skills list` should show `paper2code-arxiv-implementation`
- Use the explicit trigger: `/paper2code <url>`
- Try bare arxiv ID format: `/paper2code 1706.03762`

### Generated code has import errors
- Run `pip install -r requirements.txt` first
- Check `REPRODUCTION_NOTES.md` for noted dependencies
- Standard components (e.g., HuggingFace transformers) are imported, not reimplemented — install them separately

### "Paper not found" or fetch errors
- Confirm the arxiv ID exists: `https://arxiv.org/abs/<ID>`
- Try the full URL instead of bare ID
- Some very new papers (hours old) may not be indexed yet

### Silent assumptions in generated code
- This should not happen by design — if you find one, it's a bug
- Check `REPRODUCTION_NOTES.md` first; the assumption may be documented there
- Report via the repo issues if a gap was genuinely filled silently

### Framework-specific issues
- Default framework is PyTorch — omitting `--framework` gives PyTorch output
- JAX output requires `jax`, `flax`, `optax` — listed in `requirements.txt`
- TensorFlow output requires `tensorflow>=2.x`

---

## Contributing

### Add a worked example
1. Run: `/paper2code https://arxiv.org/abs/XXXX.XXXXX`
2. Save output to `skills/paper2code/worked/{paper_slug}/`
3. Write `review.md` evaluating correctness, flagged ambiguities, and any mistakes
4. Submit PR

### Improve guardrails
Add patterns where the skill makes silent assumptions to `guardrails/`.

### Add domain knowledge
Papers in your subfield reference common components? Add a knowledge file to `knowledge/` (e.g., `knowledge/graph_neural_networks.md`).

---

## Resources

- **Repo**: https://github.com/PrathamLearnsToCode/paper2code
- **Worked examples**: `skills/paper2code/worked/` in the repo
- **Issues**: https://github.com/PrathamLearnsToCode/paper2code/issues
- **License**: MIT

Related Skills

```markdown

from Aradotso/trending-skills

---

zeroboot-vm-sandbox

from Aradotso/trending-skills

Sub-millisecond VM sandboxes for AI agents using copy-on-write KVM forking via Zeroboot

yourvpndead-vpn-detection

from Aradotso/trending-skills

Android app that detects VPN/proxy servers (VLESS/xray/sing-box) via local SOCKS5 vulnerability, exposing exit IPs and server configs without root

xata-postgres-platform

from Aradotso/trending-skills

Expert skill for Xata open-source cloud-native Postgres platform with copy-on-write branching, scale-to-zero, and Kubernetes deployment

x-mentor-skill-nuwa

from Aradotso/trending-skills

AI-powered X (Twitter) content strategy skill that distills methodologies from 6 top creators + open-source algorithm data into actionable writing, growth, and monetization guidance.

wx-favorites-report

from Aradotso/trending-skills

End-to-end pipeline to extract, decrypt, and visualize WeChat Mac favorites from encrypted SQLite DB into an interactive HTML report.

wterm-web-terminal

from Aradotso/trending-skills

Web terminal emulator with Zig/WASM core, DOM rendering, and React/vanilla JS bindings

worldmonitor-intelligence-dashboard

from Aradotso/trending-skills

Real-time global intelligence dashboard with AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking

witr-process-inspector

from Aradotso/trending-skills

CLI and TUI tool that explains why processes, services, and ports are running by tracing causality chains across supervisors, containers, and shells.

wildworld-dataset

from Aradotso/trending-skills

WildWorld large-scale action-conditioned world modeling dataset with 108M+ frames from a photorealistic ARPG game, featuring per-frame annotations, 450+ actions, and explicit state information for generative world modeling research.

whatcable-macos-usb-inspector

from Aradotso/trending-skills

macOS menu bar app that identifies USB-C cable capabilities and charging diagnostics using IOKit

wewrite-wechat-ai-publishing

from Aradotso/trending-skills

Full-pipeline AI skill for WeChat Official Account articles — hotspot fetching, topic selection, writing, SEO, image generation, formatting, and draft box publishing.