report-research
Write a complete Numerai experiment report in experiment.md (abstract, methods, results tables, decisions, next steps) and generate/link the standard show_experiment plot(s). Use after running any Numerai research experiments, or when a user asks for a “full report”, “write up”, “experiment.md update”, or “generate the standard plot”.
Best use case
report-research is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Write a complete Numerai experiment report in experiment.md (abstract, methods, results tables, decisions, next steps) and generate/link the standard show_experiment plot(s). Use after running any Numerai research experiments, or when a user asks for a “full report”, “write up”, “experiment.md update”, or “generate the standard plot”.
Teams using report-research should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/report-research/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How report-research Compares
| Feature / Agent | report-research | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Write a complete Numerai experiment report in experiment.md (abstract, methods, results tables, decisions, next steps) and generate/link the standard show_experiment plot(s). Use after running any Numerai research experiments, or when a user asks for a “full report”, “write up”, “experiment.md update”, or “generate the standard plot”.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agent for YouTube Script Writing
Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.
AI Agents for Startups
Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
SKILL.md Source
# Report Research ## Overview This skill turns an experiment run into a durable write-up: a full `experiment.md` plus the standard `show_experiment` plot(s) linked from the report. ## Workflow (do all steps) ### 1) Locate the experiment folder Use the folder that contains: - `configs/` (the configs you ran) - `results/` (JSON metrics output) - `predictions/` (OOF parquet output) - `experiment.md` (the report you will write/update) ### 2) Inventory what was actually run - List configs that exist. - Determine which ones were executed by checking for matching `results/*.json` and `predictions/*.parquet`. - Identify the “best” model(s) using `bmc_mean` and `bmc_last_200_eras.mean` (primary), with `corr_mean` as a sanity check. - If experiments were run in rounds, summarize **each round’s intent** (what changed) and whether it improved the current best. ### 3) Extract metrics for the report For each run you report, include at least: - `corr_mean` - `bmc_mean` - `bmc_last_200_eras.mean` - `avg_corr_with_benchmark` (from the BMC summary) Prefer a single markdown table with one row per model. ### 4) Write a full report in experiment.md Update/create `experiment.md` with these sections (keep it crisp but complete): - **Title + Date** - **Abstract** (what was tested + headline result) - **Hypothesis / Motivation** (why this should help BMC) - **Method** (data split, CV, feature set, model type/hparams, any transforms) - **Experiments run** (one subsection per config that actually ran; include output artifacts) - **Results** (the metrics table; mention best run + trade-offs) - **Standard plot** (embed the PNG and include the generating command) - **Decisions made** (what you chose and why; e.g., per-era vs global, feature set choice, sweep choices) - **Stopping rationale** (why you stopped iterating; e.g., plateau after N rounds, confirmatory scale step, diminishing returns) - **Findings** (what worked / didn’t; interpret the plot) - **Next experiments** (2–5 concrete follow-ups) - **Repro commands** (train + plot commands from repo root) ### 5) Generate the standard plot(s) and link them Default standard plot (baseline = benchmark predictions): ```bash PYTHONPATH=numerai python3 -m agents.code.analysis.show_experiment benchmark <best_model_results_name> \ --base-benchmark-model v52_lgbm_ender20 \ --benchmark-data-path numerai/v5.2/full_benchmark_models.parquet \ --start-era 575 --dark \ --output-dir numerai/agents/experiments/<experiment_name> \ --baselines-dir numerai/agents/baselines ``` Then embed it in `experiment.md` with a relative link: ```md  ``` If you have multiple candidate models, either: - generate one plot with multiple experiment models, or - generate one plot per candidate (and link all of them). ### 6) Final checks - Plot files exist under `plots/`. - `experiment.md` links resolve (use relative paths). - Metrics table matches `results/*.json`. - Report clearly states what was run vs what is only planned/configured.
Related Skills
numerai-research
End-to-end Numerai research workflow for trying a new idea: design experiments, implement new model types if needed, run scout→scale experiments, write a full experiment.md report with standard plots, and optionally package/upload a Numerai pickle. Use when a user asks to “try/test a new idea”, “run an experiment”, “sweep configs”, “compare model variants”, or otherwise do new Numerai research.
numerai-model-upload
Create Numerai Tournament model upload pickles (.pkl) with a self-contained predict() function. Use when preparing upload artifacts, debugging numerai_predict import errors, or documenting model-upload requirements and testing steps.
numerai-model-implementation
Add a new Numerai model type to the agents training pipeline. Use when you need to register a model in `agents/code/modeling/utils/model_factory.py`, handle fit/predict quirks in `agents/code/modeling/utils/numerai_cv.py`, and update configs so the model can run via `python -m agents.code.modeling`.
numerai-experiment-design
Design and manage Numerai experiments in this repo for any model idea.
ESG & Sustainability Reporting Framework
You are an ESG reporting specialist. Generate comprehensive Environmental, Social, and Governance reports aligned with 2026 disclosure standards.
Board Reporting Framework
Generate investor-ready board decks and reporting packages. Covers monthly board updates, quarterly deep dives, and annual reviews with the metrics that actually matter.
Annual Report Generator
Build a complete annual business report from raw data. Covers financial performance, operational metrics, strategic highlights, and forward-looking guidance.
daily-report-generator
Automatically generate daily/weekly work reports from git commits, calendar events, and task lists. Use when you need to quickly create professional work reports without manual effort.
autoresearch-pro
Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says 'optimize [skill]', 'autoresearch [skill]', 'improve my skill', 'optimize this prompt', 'improve my prompt', 'polish this article', 'improve this article', or explicitly requests quality improvement for any text-based content. Supports three modes: skill (SKILL.md files), prompt (any prompt text), and article (any document).
X/Twitter Research Skill
Research trending topics, ideas, and conversations on X (Twitter) using twitterapi.io.
token-research
Comprehensive token research for EVM chains (Base, ETH, Arbitrum) and Solana. Use this skill when you want to research crypto tokens, deep-dive projects or monitor tokens.
xvary-stock-research
Thesis-driven equity analysis from public SEC EDGAR and market data; /analyze, /score, /compare workflows with bundled Python tools (Claude Code, Cursor, Codex).