numerai-research

End-to-end Numerai research workflow for trying a new idea: design experiments, implement new model types if needed, run scout→scale experiments, write a full experiment.md report with standard plots, and optionally package/upload a Numerai pickle. Use when a user asks to “try/test a new idea”, “run an experiment”, “sweep configs”, “compare model variants”, or otherwise do new Numerai research.

1,123 stars

bynumerai

View on GitHub Installation ↓

Best use case

numerai-research is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using numerai-research should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/numerai-research/SKILL.md --create-dirs "https://raw.githubusercontent.com/numerai/example-scripts/main/numerai/agents/skills/numerai-research/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/numerai-research/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How numerai-research Compares

Feature / Agent	numerai-research	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

AI Agent for YouTube Script Writing

Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.

SKILL.md Source

# Numerai Research

## Overview

This skill is a “meta-workflow” that sequences existing Numerai skills so research requests reliably produce: (1) runnable configs, (2) executed experiments, (3) a full written report + plots, and (4) a deployable pickle when requested.

## Workflow (always follow this order)

### 1) Design the experiment (use numerai-experiment-design)

- Follow the `numerai-experiment-design` skill to:
  - clarify the idea (or run quick scout interpretations if ambiguous)
  - choose baseline + feature set alignment (default ender20 baseline)
  - create an experiment folder under `numerai/agents/experiments/<experiment_name>/`
  - write configs in `configs/`
  - run training via `PYTHONPATH=numerai python3 -m agents.code.modeling --config <config> --output-dir <experiment_dir>`
  - track metrics with BMC as primary (`bmc_mean`, `bmc_last_200_eras`)
  - **iterate in rounds** (typically 4–5 configs per round), and keep going until you hit a plateau (per the experiment-design skill)
  - **scale winners** (bigger feature set and/or full data) before finalizing the best model

### 2) Implement new model types if needed (use numerai-model-implementation)

Only if the idea requires new code (new model wrapper, new fit/predict behavior, etc.):
- Follow the `numerai-model-implementation` skill to add the model type and register it.
- Add at least one smoke-test config and verify the pipeline runs.

### 3) Report the research (use report-research)

After you have iterated through multiple rounds **and** stopped finding improvements (plateau), and after any confirmatory scale runs:
- Follow the `report-research` skill to:
  - write a full `experiment.md` (abstract + methods + results + decisions + next steps)
  - generate the standard `show_experiment` plot(s)
  - link plots and artifacts in the report

### 4) Package and upload (use numerai-model-upload)

If (and only if) the user wants deployment:
- Follow the `numerai-model-upload` skill to create a Numerai-compatible pickle and upload it via the Numerai MCP.
- Remember: only Classic (tournament 8) supports pickle uploads.

## Defaults (unless user specifies otherwise)

- Scout first on downsampled data; scale only winners.
- Run experiments in rounds (4–5 configs per round) and stop only after a plateau + confirmatory scale step.
- Benchmark reference: `v52_lgbm_ender20`.
- Always record corr + BMC metrics and include the standard plot in the report.

Related Skills

report-research

1123

from numerai/example-scripts

Write a complete Numerai experiment report in experiment.md (abstract, methods, results tables, decisions, next steps) and generate/link the standard show_experiment plot(s). Use after running any Numerai research experiments, or when a user asks for a “full report”, “write up”, “experiment.md update”, or “generate the standard plot”.

numerai-model-upload

1123

from numerai/example-scripts

Create Numerai Tournament model upload pickles (.pkl) with a self-contained predict() function. Use when preparing upload artifacts, debugging numerai_predict import errors, or documenting model-upload requirements and testing steps.

numerai-model-implementation

1123

from numerai/example-scripts

Add a new Numerai model type to the agents training pipeline. Use when you need to register a model in `agents/code/modeling/utils/model_factory.py`, handle fit/predict quirks in `agents/code/modeling/utils/numerai_cv.py`, and update configs so the model can run via `python -m agents.code.modeling`.

numerai-experiment-design

1123

from numerai/example-scripts

Design and manage Numerai experiments in this repo for any model idea.

autoresearch-pro

3891

from openclaw/skills

Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says 'optimize [skill]', 'autoresearch [skill]', 'improve my skill', 'optimize this prompt', 'improve my prompt', 'polish this article', 'improve this article', or explicitly requests quality improvement for any text-based content. Supports three modes: skill (SKILL.md files), prompt (any prompt text), and article (any document).

Workflow & Productivity

X/Twitter Research Skill

3891

from openclaw/skills

Research trending topics, ideas, and conversations on X (Twitter) using twitterapi.io.

Data & Research

token-research

3891

from openclaw/skills

Comprehensive token research for EVM chains (Base, ETH, Arbitrum) and Solana. Use this skill when you want to research crypto tokens, deep-dive projects or monitor tokens.

Data & Research

xvary-stock-research

31392

from sickn33/antigravity-awesome-skills

Thesis-driven equity analysis from public SEC EDGAR and market data; /analyze, /score, /compare workflows with bundled Python tools (Claude Code, Cursor, Codex).

wiki-researcher

31392

from sickn33/antigravity-awesome-skills

You are an expert software engineer and systems analyst. Use when user asks "how does X work" with expectation of depth, user wants to understand a complex system spanning many files, or user asks for architectural analysis or pattern investigation.

apify-market-research

31392

from sickn33/antigravity-awesome-skills

Analyze market conditions, geographic opportunities, pricing, consumer behavior, and product validation across Google Maps, Facebook, Instagram, Booking.com, and TripAdvisor.

autoresearch

28865

from github/awesome-copilot

Autonomous iterative experimentation loop for any programming task. Guides the user through defining goals, measurable metrics, and scope constraints, then runs an autonomous loop of code changes, testing, measuring, and keeping/discarding results. Inspired by Karpathy's autoresearch. USE FOR: autonomous improvement, iterative optimization, experiment loop, auto research, performance tuning, automated experimentation, hill climbing, try things automatically, optimize code, run experiments, autonomous coding loop. DO NOT USE FOR: one-shot tasks, simple bug fixes, code review, or tasks without a measurable metric.

research

11146

from danielmiessler/Personal_AI_Infrastructure

Comprehensive research, analysis, and content extraction system. USE WHEN user says 'do research', 'do extensive research', 'quick research', 'minor research', 'research this', 'find information', 'investigate', 'extract wisdom', 'extract alpha', 'analyze content', 'can't get this content', 'use fabric', OR requests any web/content research. Supports three research modes (quick/standard/extensive), deep content analysis, intelligent retrieval, and 242+ Fabric patterns. NOTE: For due diligence, OSINT, or background checks, use OSINT skill instead.