AI Agent Skill HUB

report-research

Write a complete Numerai experiment report in experiment.md (abstract, methods, results tables, decisions, next steps) and generate/link the standard show_experiment plot(s). Use after running any Numerai research experiments, or when a user asks for a “full report”, “write up”, “experiment.md update”, or “generate the standard plot”.

1,123 stars

View on GitHub Installation ↓

Best use case

report-research is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Write a complete Numerai experiment report in experiment.md (abstract, methods, results tables, decisions, next steps) and generate/link the standard show_experiment plot(s). Use after running any Numerai research experiments, or when a user asks for a “full report”, “write up”, “experiment.md update”, or “generate the standard plot”.

Teams using report-research should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/report-research/SKILL.md --create-dirs "https://raw.githubusercontent.com/numerai/example-scripts/main/numerai/agents/skills/report-research/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/report-research/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How report-research Compares

Feature / Agent	report-research	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Write a complete Numerai experiment report in experiment.md (abstract, methods, results tables, decisions, next steps) and generate/link the standard show_experiment plot(s). Use after running any Numerai research experiments, or when a user asks for a “full report”, “write up”, “experiment.md update”, or “generate the standard plot”.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for YouTube Script Writing

Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

Best AI Skills for ChatGPT

Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.

SKILL.md Source

# Report Research

## Overview

This skill turns an experiment run into a durable write-up: a full `experiment.md` plus the standard `show_experiment` plot(s) linked from the report.

## Workflow (do all steps)

### 1) Locate the experiment folder

Use the folder that contains:
- `configs/` (the configs you ran)
- `results/` (JSON metrics output)
- `predictions/` (OOF parquet output)
- `experiment.md` (the report you will write/update)

### 2) Inventory what was actually run

- List configs that exist.
- Determine which ones were executed by checking for matching `results/*.json` and `predictions/*.parquet`.
- Identify the “best” model(s) using `bmc_mean` and `bmc_last_200_eras.mean` (primary), with `corr_mean` as a sanity check.
- If experiments were run in rounds, summarize **each round’s intent** (what changed) and whether it improved the current best.

### 3) Extract metrics for the report

For each run you report, include at least:
- `corr_mean`
- `bmc_mean`
- `bmc_last_200_eras.mean`
- `avg_corr_with_benchmark` (from the BMC summary)

Prefer a single markdown table with one row per model.

### 4) Write a full report in experiment.md

Update/create `experiment.md` with these sections (keep it crisp but complete):
- **Title + Date**
- **Abstract** (what was tested + headline result)
- **Hypothesis / Motivation** (why this should help BMC)
- **Method** (data split, CV, feature set, model type/hparams, any transforms)
- **Experiments run** (one subsection per config that actually ran; include output artifacts)
- **Results** (the metrics table; mention best run + trade-offs)
- **Standard plot** (embed the PNG and include the generating command)
- **Decisions made** (what you chose and why; e.g., per-era vs global, feature set choice, sweep choices)
- **Stopping rationale** (why you stopped iterating; e.g., plateau after N rounds, confirmatory scale step, diminishing returns)
- **Findings** (what worked / didn’t; interpret the plot)
- **Next experiments** (2–5 concrete follow-ups)
- **Repro commands** (train + plot commands from repo root)

### 5) Generate the standard plot(s) and link them

Default standard plot (baseline = benchmark predictions):

```bash
PYTHONPATH=numerai python3 -m agents.code.analysis.show_experiment benchmark <best_model_results_name> \
  --base-benchmark-model v52_lgbm_ender20 \
  --benchmark-data-path numerai/v5.2/full_benchmark_models.parquet \
  --start-era 575 --dark \
  --output-dir numerai/agents/experiments/<experiment_name> \
  --baselines-dir numerai/agents/baselines
```

Then embed it in `experiment.md` with a relative link:

```md
![benchmark vs best model](plots/<generated_plot_name>.png)
```

If you have multiple candidate models, either:
- generate one plot with multiple experiment models, or
- generate one plot per candidate (and link all of them).

### 6) Final checks

- Plot files exist under `plots/`.
- `experiment.md` links resolve (use relative paths).
- Metrics table matches `results/*.json`.
- Report clearly states what was run vs what is only planned/configured.

Related Skills

numerai-research

from numerai/example-scripts

End-to-end Numerai research workflow for trying a new idea: design experiments, implement new model types if needed, run scout→scale experiments, write a full experiment.md report with standard plots, and optionally package/upload a Numerai pickle. Use when a user asks to “try/test a new idea”, “run an experiment”, “sweep configs”, “compare model variants”, or otherwise do new Numerai research.

numerai-model-upload

from numerai/example-scripts

Create Numerai Tournament model upload pickles (.pkl) with a self-contained predict() function. Use when preparing upload artifacts, debugging numerai_predict import errors, or documenting model-upload requirements and testing steps.

numerai-model-implementation

from numerai/example-scripts

Add a new Numerai model type to the agents training pipeline. Use when you need to register a model in `agents/code/modeling/utils/model_factory.py`, handle fit/predict quirks in `agents/code/modeling/utils/numerai_cv.py`, and update configs so the model can run via `python -m agents.code.modeling`.

numerai-experiment-design

from numerai/example-scripts

Design and manage Numerai experiments in this repo for any model idea.

ESG & Sustainability Reporting Framework

from openclaw/skills

You are an ESG reporting specialist. Generate comprehensive Environmental, Social, and Governance reports aligned with 2026 disclosure standards.

Workflow & Productivity

Board Reporting Framework

from openclaw/skills

Generate investor-ready board decks and reporting packages. Covers monthly board updates, quarterly deep dives, and annual reviews with the metrics that actually matter.

Workflow & Productivity

Annual Report Generator

from openclaw/skills

Build a complete annual business report from raw data. Covers financial performance, operational metrics, strategic highlights, and forward-looking guidance.

Workflow & Productivity

daily-report-generator

from openclaw/skills

Automatically generate daily/weekly work reports from git commits, calendar events, and task lists. Use when you need to quickly create professional work reports without manual effort.

Workflow & Productivity

autoresearch-pro

from openclaw/skills

Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says 'optimize [skill]', 'autoresearch [skill]', 'improve my skill', 'optimize this prompt', 'improve my prompt', 'polish this article', 'improve this article', or explicitly requests quality improvement for any text-based content. Supports three modes: skill (SKILL.md files), prompt (any prompt text), and article (any document).

Workflow & Productivity

X/Twitter Research Skill

from openclaw/skills

Research trending topics, ideas, and conversations on X (Twitter) using twitterapi.io.

Data & Research

token-research

from openclaw/skills

Comprehensive token research for EVM chains (Base, ETH, Arbitrum) and Solana. Use this skill when you want to research crypto tokens, deep-dive projects or monitor tokens.

Data & Research

xvary-stock-research

from sickn33/antigravity-awesome-skills

Thesis-driven equity analysis from public SEC EDGAR and market data; /analyze, /score, /compare workflows with bundled Python tools (Claude Code, Cursor, Codex).