numerai-research
End-to-end Numerai research workflow for trying a new idea: design experiments, implement new model types if needed, run scout→scale experiments, write a full experiment.md report with standard plots, and optionally package/upload a Numerai pickle. Use when a user asks to “try/test a new idea”, “run an experiment”, “sweep configs”, “compare model variants”, or otherwise do new Numerai research.
Best use case
numerai-research is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
End-to-end Numerai research workflow for trying a new idea: design experiments, implement new model types if needed, run scout→scale experiments, write a full experiment.md report with standard plots, and optionally package/upload a Numerai pickle. Use when a user asks to “try/test a new idea”, “run an experiment”, “sweep configs”, “compare model variants”, or otherwise do new Numerai research.
Teams using numerai-research should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/numerai-research/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How numerai-research Compares
| Feature / Agent | numerai-research | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
End-to-end Numerai research workflow for trying a new idea: design experiments, implement new model types if needed, run scout→scale experiments, write a full experiment.md report with standard plots, and optionally package/upload a Numerai pickle. Use when a user asks to “try/test a new idea”, “run an experiment”, “sweep configs”, “compare model variants”, or otherwise do new Numerai research.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Startups
Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
AI Agent for YouTube Script Writing
Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.
SKILL.md Source
# Numerai Research ## Overview This skill is a “meta-workflow” that sequences existing Numerai skills so research requests reliably produce: (1) runnable configs, (2) executed experiments, (3) a full written report + plots, and (4) a deployable pickle when requested. ## Workflow (always follow this order) ### 1) Design the experiment (use numerai-experiment-design) - Follow the `numerai-experiment-design` skill to: - clarify the idea (or run quick scout interpretations if ambiguous) - choose baseline + feature set alignment (default ender20 baseline) - create an experiment folder under `numerai/agents/experiments/<experiment_name>/` - write configs in `configs/` - run training via `PYTHONPATH=numerai python3 -m agents.code.modeling --config <config> --output-dir <experiment_dir>` - track metrics with BMC as primary (`bmc_mean`, `bmc_last_200_eras`) - **iterate in rounds** (typically 4–5 configs per round), and keep going until you hit a plateau (per the experiment-design skill) - **scale winners** (bigger feature set and/or full data) before finalizing the best model ### 2) Implement new model types if needed (use numerai-model-implementation) Only if the idea requires new code (new model wrapper, new fit/predict behavior, etc.): - Follow the `numerai-model-implementation` skill to add the model type and register it. - Add at least one smoke-test config and verify the pipeline runs. ### 3) Report the research (use report-research) After you have iterated through multiple rounds **and** stopped finding improvements (plateau), and after any confirmatory scale runs: - Follow the `report-research` skill to: - write a full `experiment.md` (abstract + methods + results + decisions + next steps) - generate the standard `show_experiment` plot(s) - link plots and artifacts in the report ### 4) Package and upload (use numerai-model-upload) If (and only if) the user wants deployment: - Follow the `numerai-model-upload` skill to create a Numerai-compatible pickle and upload it via the Numerai MCP. - Remember: only Classic (tournament 8) supports pickle uploads. ## Defaults (unless user specifies otherwise) - Scout first on downsampled data; scale only winners. - Run experiments in rounds (4–5 configs per round) and stop only after a plateau + confirmatory scale step. - Benchmark reference: `v52_lgbm_ender20`. - Always record corr + BMC metrics and include the standard plot in the report.
Related Skills
report-research
Write a complete Numerai experiment report in experiment.md (abstract, methods, results tables, decisions, next steps) and generate/link the standard show_experiment plot(s). Use after running any Numerai research experiments, or when a user asks for a “full report”, “write up”, “experiment.md update”, or “generate the standard plot”.
numerai-model-upload
Create Numerai Tournament model upload pickles (.pkl) with a self-contained predict() function. Use when preparing upload artifacts, debugging numerai_predict import errors, or documenting model-upload requirements and testing steps.
numerai-model-implementation
Add a new Numerai model type to the agents training pipeline. Use when you need to register a model in `agents/code/modeling/utils/model_factory.py`, handle fit/predict quirks in `agents/code/modeling/utils/numerai_cv.py`, and update configs so the model can run via `python -m agents.code.modeling`.
numerai-experiment-design
Design and manage Numerai experiments in this repo for any model idea.
autoresearch-pro
Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says 'optimize [skill]', 'autoresearch [skill]', 'improve my skill', 'optimize this prompt', 'improve my prompt', 'polish this article', 'improve this article', or explicitly requests quality improvement for any text-based content. Supports three modes: skill (SKILL.md files), prompt (any prompt text), and article (any document).
X/Twitter Research Skill
Research trending topics, ideas, and conversations on X (Twitter) using twitterapi.io.
token-research
Comprehensive token research for EVM chains (Base, ETH, Arbitrum) and Solana. Use this skill when you want to research crypto tokens, deep-dive projects or monitor tokens.
xvary-stock-research
Thesis-driven equity analysis from public SEC EDGAR and market data; /analyze, /score, /compare workflows with bundled Python tools (Claude Code, Cursor, Codex).
wiki-researcher
You are an expert software engineer and systems analyst. Use when user asks "how does X work" with expectation of depth, user wants to understand a complex system spanning many files, or user asks for architectural analysis or pattern investigation.
apify-market-research
Analyze market conditions, geographic opportunities, pricing, consumer behavior, and product validation across Google Maps, Facebook, Instagram, Booking.com, and TripAdvisor.
autoresearch
Autonomous iterative experimentation loop for any programming task. Guides the user through defining goals, measurable metrics, and scope constraints, then runs an autonomous loop of code changes, testing, measuring, and keeping/discarding results. Inspired by Karpathy's autoresearch. USE FOR: autonomous improvement, iterative optimization, experiment loop, auto research, performance tuning, automated experimentation, hill climbing, try things automatically, optimize code, run experiments, autonomous coding loop. DO NOT USE FOR: one-shot tasks, simple bug fixes, code review, or tasks without a measurable metric.
research
Comprehensive research, analysis, and content extraction system. USE WHEN user says 'do research', 'do extensive research', 'quick research', 'minor research', 'research this', 'find information', 'investigate', 'extract wisdom', 'extract alpha', 'analyze content', 'can't get this content', 'use fabric', OR requests any web/content research. Supports three research modes (quick/standard/extensive), deep content analysis, intelligent retrieval, and 242+ Fabric patterns. NOTE: For due diligence, OSINT, or background checks, use OSINT skill instead.