report-research

Write a complete Numerai experiment report in experiment.md (abstract, methods, results tables, decisions, next steps) and generate/link the standard show_experiment plot(s). Use after running any Numerai research experiments, or when a user asks for a “full report”, “write up”, “experiment.md update”, or “generate the standard plot”.

1,123 stars

Best use case

report-research is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Write a complete Numerai experiment report in experiment.md (abstract, methods, results tables, decisions, next steps) and generate/link the standard show_experiment plot(s). Use after running any Numerai research experiments, or when a user asks for a “full report”, “write up”, “experiment.md update”, or “generate the standard plot”.

Teams using report-research should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/report-research/SKILL.md --create-dirs "https://raw.githubusercontent.com/numerai/example-scripts/main/numerai/agents/skills/report-research/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/report-research/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How report-research Compares

Feature / Agentreport-researchStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Write a complete Numerai experiment report in experiment.md (abstract, methods, results tables, decisions, next steps) and generate/link the standard show_experiment plot(s). Use after running any Numerai research experiments, or when a user asks for a “full report”, “write up”, “experiment.md update”, or “generate the standard plot”.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Report Research

## Overview

This skill turns an experiment run into a durable write-up: a full `experiment.md` plus the standard `show_experiment` plot(s) linked from the report.

## Workflow (do all steps)

### 1) Locate the experiment folder

Use the folder that contains:
- `configs/` (the configs you ran)
- `results/` (JSON metrics output)
- `predictions/` (OOF parquet output)
- `experiment.md` (the report you will write/update)

### 2) Inventory what was actually run

- List configs that exist.
- Determine which ones were executed by checking for matching `results/*.json` and `predictions/*.parquet`.
- Identify the “best” model(s) using `bmc_mean` and `bmc_last_200_eras.mean` (primary), with `corr_mean` as a sanity check.
- If experiments were run in rounds, summarize **each round’s intent** (what changed) and whether it improved the current best.

### 3) Extract metrics for the report

For each run you report, include at least:
- `corr_mean`
- `bmc_mean`
- `bmc_last_200_eras.mean`
- `avg_corr_with_benchmark` (from the BMC summary)

Prefer a single markdown table with one row per model.

### 4) Write a full report in experiment.md

Update/create `experiment.md` with these sections (keep it crisp but complete):
- **Title + Date**
- **Abstract** (what was tested + headline result)
- **Hypothesis / Motivation** (why this should help BMC)
- **Method** (data split, CV, feature set, model type/hparams, any transforms)
- **Experiments run** (one subsection per config that actually ran; include output artifacts)
- **Results** (the metrics table; mention best run + trade-offs)
- **Standard plot** (embed the PNG and include the generating command)
- **Decisions made** (what you chose and why; e.g., per-era vs global, feature set choice, sweep choices)
- **Stopping rationale** (why you stopped iterating; e.g., plateau after N rounds, confirmatory scale step, diminishing returns)
- **Findings** (what worked / didn’t; interpret the plot)
- **Next experiments** (2–5 concrete follow-ups)
- **Repro commands** (train + plot commands from repo root)

### 5) Generate the standard plot(s) and link them

Default standard plot (baseline = benchmark predictions):

```bash
PYTHONPATH=numerai python3 -m agents.code.analysis.show_experiment benchmark <best_model_results_name> \
  --base-benchmark-model v52_lgbm_ender20 \
  --benchmark-data-path numerai/v5.2/full_benchmark_models.parquet \
  --start-era 575 --dark \
  --output-dir numerai/agents/experiments/<experiment_name> \
  --baselines-dir numerai/agents/baselines
```

Then embed it in `experiment.md` with a relative link:

```md
![benchmark vs best model](plots/<generated_plot_name>.png)
```

If you have multiple candidate models, either:
- generate one plot with multiple experiment models, or
- generate one plot per candidate (and link all of them).

### 6) Final checks

- Plot files exist under `plots/`.
- `experiment.md` links resolve (use relative paths).
- Metrics table matches `results/*.json`.
- Report clearly states what was run vs what is only planned/configured.

Related Skills

numerai-research

1123
from numerai/example-scripts

End-to-end Numerai research workflow for trying a new idea: design experiments, implement new model types if needed, run scout→scale experiments, write a full experiment.md report with standard plots, and optionally package/upload a Numerai pickle. Use when a user asks to “try/test a new idea”, “run an experiment”, “sweep configs”, “compare model variants”, or otherwise do new Numerai research.

numerai-model-upload

1123
from numerai/example-scripts

Create Numerai Tournament model upload pickles (.pkl) with a self-contained predict() function. Use when preparing upload artifacts, debugging numerai_predict import errors, or documenting model-upload requirements and testing steps.

numerai-model-implementation

1123
from numerai/example-scripts

Add a new Numerai model type to the agents training pipeline. Use when you need to register a model in `agents/code/modeling/utils/model_factory.py`, handle fit/predict quirks in `agents/code/modeling/utils/numerai_cv.py`, and update configs so the model can run via `python -m agents.code.modeling`.

numerai-experiment-design

1123
from numerai/example-scripts

Design and manage Numerai experiments in this repo for any model idea.

ESG & Sustainability Reporting Framework

3891
from openclaw/skills

You are an ESG reporting specialist. Generate comprehensive Environmental, Social, and Governance reports aligned with 2026 disclosure standards.

Workflow & Productivity

Board Reporting Framework

3891
from openclaw/skills

Generate investor-ready board decks and reporting packages. Covers monthly board updates, quarterly deep dives, and annual reviews with the metrics that actually matter.

Workflow & Productivity

Annual Report Generator

3891
from openclaw/skills

Build a complete annual business report from raw data. Covers financial performance, operational metrics, strategic highlights, and forward-looking guidance.

Workflow & Productivity

daily-report-generator

3891
from openclaw/skills

Automatically generate daily/weekly work reports from git commits, calendar events, and task lists. Use when you need to quickly create professional work reports without manual effort.

Workflow & Productivity

autoresearch-pro

3891
from openclaw/skills

Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says 'optimize [skill]', 'autoresearch [skill]', 'improve my skill', 'optimize this prompt', 'improve my prompt', 'polish this article', 'improve this article', or explicitly requests quality improvement for any text-based content. Supports three modes: skill (SKILL.md files), prompt (any prompt text), and article (any document).

Workflow & Productivity

X/Twitter Research Skill

3891
from openclaw/skills

Research trending topics, ideas, and conversations on X (Twitter) using twitterapi.io.

Data & Research

token-research

3891
from openclaw/skills

Comprehensive token research for EVM chains (Base, ETH, Arbitrum) and Solana. Use this skill when you want to research crypto tokens, deep-dive projects or monitor tokens.

Data & Research

xvary-stock-research

31392
from sickn33/antigravity-awesome-skills

Thesis-driven equity analysis from public SEC EDGAR and market data; /analyze, /score, /compare workflows with bundled Python tools (Claude Code, Cursor, Codex).