numerai-research

End-to-end Numerai research workflow for trying a new idea: design experiments, implement new model types if needed, run scout→scale experiments, write a full experiment.md report with standard plots, and optionally package/upload a Numerai pickle. Use when a user asks to “try/test a new idea”, “run an experiment”, “sweep configs”, “compare model variants”, or otherwise do new Numerai research.

1,123 stars

Best use case

numerai-research is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

End-to-end Numerai research workflow for trying a new idea: design experiments, implement new model types if needed, run scout→scale experiments, write a full experiment.md report with standard plots, and optionally package/upload a Numerai pickle. Use when a user asks to “try/test a new idea”, “run an experiment”, “sweep configs”, “compare model variants”, or otherwise do new Numerai research.

Teams using numerai-research should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/numerai-research/SKILL.md --create-dirs "https://raw.githubusercontent.com/numerai/example-scripts/main/numerai/agents/skills/numerai-research/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/numerai-research/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How numerai-research Compares

Feature / Agentnumerai-researchStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

End-to-end Numerai research workflow for trying a new idea: design experiments, implement new model types if needed, run scout→scale experiments, write a full experiment.md report with standard plots, and optionally package/upload a Numerai pickle. Use when a user asks to “try/test a new idea”, “run an experiment”, “sweep configs”, “compare model variants”, or otherwise do new Numerai research.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Numerai Research

## Overview

This skill is a “meta-workflow” that sequences existing Numerai skills so research requests reliably produce: (1) runnable configs, (2) executed experiments, (3) a full written report + plots, and (4) a deployable pickle when requested.

## Workflow (always follow this order)

### 1) Design the experiment (use numerai-experiment-design)

- Follow the `numerai-experiment-design` skill to:
  - clarify the idea (or run quick scout interpretations if ambiguous)
  - choose baseline + feature set alignment (default ender20 baseline)
  - create an experiment folder under `numerai/agents/experiments/<experiment_name>/`
  - write configs in `configs/`
  - run training via `PYTHONPATH=numerai python3 -m agents.code.modeling --config <config> --output-dir <experiment_dir>`
  - track metrics with BMC as primary (`bmc_mean`, `bmc_last_200_eras`)
  - **iterate in rounds** (typically 4–5 configs per round), and keep going until you hit a plateau (per the experiment-design skill)
  - **scale winners** (bigger feature set and/or full data) before finalizing the best model

### 2) Implement new model types if needed (use numerai-model-implementation)

Only if the idea requires new code (new model wrapper, new fit/predict behavior, etc.):
- Follow the `numerai-model-implementation` skill to add the model type and register it.
- Add at least one smoke-test config and verify the pipeline runs.

### 3) Report the research (use report-research)

After you have iterated through multiple rounds **and** stopped finding improvements (plateau), and after any confirmatory scale runs:
- Follow the `report-research` skill to:
  - write a full `experiment.md` (abstract + methods + results + decisions + next steps)
  - generate the standard `show_experiment` plot(s)
  - link plots and artifacts in the report

### 4) Package and upload (use numerai-model-upload)

If (and only if) the user wants deployment:
- Follow the `numerai-model-upload` skill to create a Numerai-compatible pickle and upload it via the Numerai MCP.
- Remember: only Classic (tournament 8) supports pickle uploads.

## Defaults (unless user specifies otherwise)

- Scout first on downsampled data; scale only winners.
- Run experiments in rounds (4–5 configs per round) and stop only after a plateau + confirmatory scale step.
- Benchmark reference: `v52_lgbm_ender20`.
- Always record corr + BMC metrics and include the standard plot in the report.

Related Skills

report-research

1123
from numerai/example-scripts

Write a complete Numerai experiment report in experiment.md (abstract, methods, results tables, decisions, next steps) and generate/link the standard show_experiment plot(s). Use after running any Numerai research experiments, or when a user asks for a “full report”, “write up”, “experiment.md update”, or “generate the standard plot”.

numerai-model-upload

1123
from numerai/example-scripts

Create Numerai Tournament model upload pickles (.pkl) with a self-contained predict() function. Use when preparing upload artifacts, debugging numerai_predict import errors, or documenting model-upload requirements and testing steps.

numerai-model-implementation

1123
from numerai/example-scripts

Add a new Numerai model type to the agents training pipeline. Use when you need to register a model in `agents/code/modeling/utils/model_factory.py`, handle fit/predict quirks in `agents/code/modeling/utils/numerai_cv.py`, and update configs so the model can run via `python -m agents.code.modeling`.

numerai-experiment-design

1123
from numerai/example-scripts

Design and manage Numerai experiments in this repo for any model idea.

autoresearch-pro

3891
from openclaw/skills

Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says 'optimize [skill]', 'autoresearch [skill]', 'improve my skill', 'optimize this prompt', 'improve my prompt', 'polish this article', 'improve this article', or explicitly requests quality improvement for any text-based content. Supports three modes: skill (SKILL.md files), prompt (any prompt text), and article (any document).

Workflow & Productivity

X/Twitter Research Skill

3891
from openclaw/skills

Research trending topics, ideas, and conversations on X (Twitter) using twitterapi.io.

Data & Research

token-research

3891
from openclaw/skills

Comprehensive token research for EVM chains (Base, ETH, Arbitrum) and Solana. Use this skill when you want to research crypto tokens, deep-dive projects or monitor tokens.

Data & Research

xvary-stock-research

31392
from sickn33/antigravity-awesome-skills

Thesis-driven equity analysis from public SEC EDGAR and market data; /analyze, /score, /compare workflows with bundled Python tools (Claude Code, Cursor, Codex).

wiki-researcher

31392
from sickn33/antigravity-awesome-skills

You are an expert software engineer and systems analyst. Use when user asks "how does X work" with expectation of depth, user wants to understand a complex system spanning many files, or user asks for architectural analysis or pattern investigation.

apify-market-research

31392
from sickn33/antigravity-awesome-skills

Analyze market conditions, geographic opportunities, pricing, consumer behavior, and product validation across Google Maps, Facebook, Instagram, Booking.com, and TripAdvisor.

autoresearch

28865
from github/awesome-copilot

Autonomous iterative experimentation loop for any programming task. Guides the user through defining goals, measurable metrics, and scope constraints, then runs an autonomous loop of code changes, testing, measuring, and keeping/discarding results. Inspired by Karpathy's autoresearch. USE FOR: autonomous improvement, iterative optimization, experiment loop, auto research, performance tuning, automated experimentation, hill climbing, try things automatically, optimize code, run experiments, autonomous coding loop. DO NOT USE FOR: one-shot tasks, simple bug fixes, code review, or tasks without a measurable metric.

research

11146
from danielmiessler/Personal_AI_Infrastructure

Comprehensive research, analysis, and content extraction system. USE WHEN user says 'do research', 'do extensive research', 'quick research', 'minor research', 'research this', 'find information', 'investigate', 'extract wisdom', 'extract alpha', 'analyze content', 'can't get this content', 'use fabric', OR requests any web/content research. Supports three research modes (quick/standard/extensive), deep content analysis, intelligent retrieval, and 242+ Fabric patterns. NOTE: For due diligence, OSINT, or background checks, use OSINT skill instead.