backtest-expert
Expert guidance for systematic backtesting of trading strategies. Use when developing, testing, stress-testing, or validating quantitative trading strategies. Covers "beating ideas to death" methodology, parameter robustness testing, slippage modeling, bias prevention, and interpreting backtest results. Applicable when user asks about backtesting, strategy validation, robustness testing, avoiding overfitting, or systematic trading development.
Best use case
backtest-expert is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Expert guidance for systematic backtesting of trading strategies. Use when developing, testing, stress-testing, or validating quantitative trading strategies. Covers "beating ideas to death" methodology, parameter robustness testing, slippage modeling, bias prevention, and interpreting backtest results. Applicable when user asks about backtesting, strategy validation, robustness testing, avoiding overfitting, or systematic trading development.
Teams using backtest-expert should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/backtest-expert/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How backtest-expert Compares
| Feature / Agent | backtest-expert | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Expert guidance for systematic backtesting of trading strategies. Use when developing, testing, stress-testing, or validating quantitative trading strategies. Covers "beating ideas to death" methodology, parameter robustness testing, slippage modeling, bias prevention, and interpreting backtest results. Applicable when user asks about backtesting, strategy validation, robustness testing, avoiding overfitting, or systematic trading development.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
# Backtest Expert Systematic approach to backtesting trading strategies based on professional methodology that prioritizes robustness over optimistic results. ## Core Philosophy **Goal**: Find strategies that "break the least", not strategies that "profit the most" on paper. **Principle**: Add friction, stress test assumptions, and see what survives. If a strategy holds up under pessimistic conditions, it's more likely to work in live trading. ## When to Use This Skill Use this skill when: - Developing or validating systematic trading strategies - Evaluating whether a trading idea is robust enough for live implementation - Troubleshooting why a backtest might be misleading - Learning proper backtesting methodology - Avoiding common pitfalls (curve-fitting, look-ahead bias, survivorship bias) - Assessing parameter sensitivity and regime dependence - Setting realistic expectations for slippage and execution costs ## Backtesting Workflow ### 1. State the Hypothesis Define the edge in one sentence. **Example**: "Stocks that gap up >3% on earnings and pull back to previous day's close within first hour provide mean-reversion opportunity." If you can't articulate the edge clearly, don't proceed to testing. ### 2. Codify Rules with Zero Discretion Define with complete specificity: - **Entry**: Exact conditions, timing, price type - **Exit**: Stop loss, profit target, time-based exit - **Position sizing**: Fixed $$, % of portfolio, volatility-adjusted - **Filters**: Market cap, volume, sector, volatility conditions - **Universe**: What instruments are eligible **Critical**: No subjective judgment allowed. Every decision must be rule-based and unambiguous. ### 3. Run Initial Backtest Test over: - **Minimum 5 years** (preferably 10+) - **Multiple market regimes** (bull, bear, high/low volatility) - **Realistic costs**: Commissions + conservative slippage Examine initial results for basic viability. If fundamentally broken, iterate on hypothesis. ### 4. Stress Test the Strategy This is where 80% of testing time should be spent. **Parameter sensitivity**: - Test stop loss at 50%, 75%, 100%, 125%, 150% of baseline - Test profit target at 80%, 90%, 100%, 110%, 120% of baseline - Vary entry/exit timing by ±15-30 minutes - Look for "plateaus" of stable performance, not narrow spikes **Execution friction**: - Increase slippage to 1.5-2x typical estimates - Model worst-case fills (buy at ask+1 tick, sell at bid-1 tick) - Add realistic order rejection scenarios - Test with pessimistic commission structures **Time robustness**: - Analyze year-by-year performance - Require positive expectancy in majority of years - Ensure strategy doesn't rely on 1-2 exceptional periods - Test in different market regimes separately **Sample size**: - Absolute minimum: 30 trades - Preferred: 100+ trades - High confidence: 200+ trades ### 5. Out-of-Sample Validation **Walk-forward analysis**: 1. Optimize on training period (e.g., Year 1-3) 2. Test on validation period (Year 4) 3. Roll forward and repeat 4. Compare in-sample vs out-of-sample performance **Warning signs**: - Out-of-sample <50% of in-sample performance - Need frequent parameter re-optimization - Parameters change dramatically between periods ### 6. Evaluate Results **Questions to answer**: - Does edge survive pessimistic assumptions? - Is performance stable across parameter variations? - Does strategy work in multiple market regimes? - Is sample size sufficient for statistical confidence? - Are results realistic, not "too good to be true"? **Decision criteria**: - ✅ **Deploy**: Survives all stress tests with acceptable performance - 🔄 **Refine**: Core logic sound but needs parameter adjustment - ❌ **Abandon**: Fails stress tests or relies on fragile assumptions ## Key Testing Principles ### Punish the Strategy Add friction everywhere: - Commissions higher than reality - Slippage 1.5-2x typical - Worst-case fills - Order rejections - Partial fills **Rationale**: Strategies that survive pessimistic assumptions often outperform in live trading. ### Seek Plateaus, Not Peaks Look for parameter ranges where performance is stable, not optimal values that create performance spikes. **Good**: Strategy profitable with stop loss anywhere from 1.5% to 3.0% **Bad**: Strategy only works with stop loss at exactly 2.13% Stable performance indicates genuine edge; narrow optima suggest curve-fitting. ### Test All Cases, Not Cherry-Picked Examples **Wrong approach**: Study hand-picked "market leaders" that worked **Right approach**: Test every stock that met criteria, including those that failed Selective examples create survivorship bias and overestimate strategy quality. ### Separate Idea Generation from Validation **Intuition**: Useful for generating hypotheses **Validation**: Must be purely data-driven Never let attachment to an idea influence interpretation of test results. ## Common Failure Patterns Recognize these patterns early to save time: 1. **Parameter sensitivity**: Only works with exact parameter values 2. **Regime-specific**: Great in some years, terrible in others 3. **Slippage sensitivity**: Unprofitable when realistic costs added 4. **Small sample**: Too few trades for statistical confidence 5. **Look-ahead bias**: "Too good to be true" results 6. **Over-optimization**: Many parameters, poor out-of-sample results See `references/failed_tests.md` for detailed examples and diagnostic framework. ## Available Reference Documentation ### Methodology Reference **File**: `references/methodology.md` **When to read**: For detailed guidance on specific testing techniques. **Contents**: - Stress testing methods - Parameter sensitivity analysis - Slippage and friction modeling - Sample size requirements - Market regime classification - Common biases and pitfalls (survivorship, look-ahead, curve-fitting, etc.) ### Failed Tests Reference **File**: `references/failed_tests.md` **When to read**: When strategy fails tests, or learning from past mistakes. **Contents**: - Why failures are valuable - Common failure patterns with examples - Case study documentation framework - Red flags checklist for evaluating backtests ## Critical Reminders **Time allocation**: Spend 20% generating ideas, 80% trying to break them. **Context-free requirement**: If strategy requires "perfect context" to work, it's not robust enough for systematic trading. **Red flag**: If backtest results look too good (>90% win rate, minimal drawdowns, perfect timing), audit carefully for look-ahead bias or data issues. **Tool limitations**: Understand your backtesting platform's quirks (interpolation methods, handling of low liquidity, data alignment issues). **Statistical significance**: Small edges require large sample sizes to prove. 5% edge per trade needs 100+ trades to distinguish from luck. ## Discretionary vs Systematic Differences This skill focuses on **systematic/quantitative** backtesting where: - All rules are codified in advance - No discretion or "feel" in execution - Testing happens on all historical examples, not cherry-picked cases - Context (news, macro) is deliberately stripped out Discretionary traders study differently—this skill may not apply to setups requiring subjective judgment.
Related Skills
youtube-to-markdown
Use when user asks YouTube video extraction, get, fetch, transcripts, subtitles, or captions. Writes video details and transcription into structured markdown file.
youtube-seo-optimizer
Optimize YouTube videos for search and discovery. Generates SEO-optimized titles, descriptions, tags, hashtags, and chapters. Includes keyword research and competitor analysis. Use when publishing videos, improving discoverability, or optimizing existing content.
webfluence
Content web architecture framework. Use when diagnosing offer doc usage, content-to-conversion pathways, or why someone isn't getting sales despite traffic.
video-to-gif
Convert video clips to optimized GIFs with speed control, cropping, text overlays, and file size optimization. Create perfect GIFs for social media, documentation, and presentations.
video-title-optimizer
Optimize video titles for maximum click-through rate (CTR) and YouTube/TikTok SEO. Generates multiple title variations balancing curiosity, keywords, and platform best practices. Use when naming videos, improving CTR, or A/B testing titles.
video-script-writer
Write engaging video scripts for YouTube, TikTok, and other platforms. Creates complete scripts with hooks, main content, and CTAs. Supports various formats including tutorials, vlogs, reviews, explainers, and storytelling. Use when creating video scripts, writing YouTube content, or planning video structure.
video-script-collaborial
将视频脚本转换为更适合实际录制的口语化表达,去除书面化语言,增加自然感和亲和力。当用户提到"视频脚本"、"录制"、"口语化"、"自然一点"、"像说话一样"、"太书面了"时使用此技能。
video-hook-generator
Generate attention-grabbing hooks for the first 3 seconds of videos. The hook determines if viewers stay or scroll. Creates multiple hook variations for A/B testing. Use when crafting video openings, improving retention, or creating scroll-stopping content for YouTube, TikTok, or Reels.
youtube-downloader
Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.
video-comparer
This skill should be used when comparing two videos to analyze compression results or quality differences. Generates interactive HTML reports with quality metrics (PSNR, SSIM) and frame-by-frame visual comparisons. Triggers when users mention "compare videos", "video quality", "compression analysis", "before/after compression", or request quality assessment of compressed videos.
video-analytics-interpreter
Interpret YouTube Analytics, TikTok Analytics, and video performance data. Identifies trends, explains metrics, and provides actionable recommendations for growth. Use when analyzing video performance, understanding metrics, or optimizing channel strategy.
thumbnail-concept-generator
Generate thumbnail concepts and ideas for YouTube, TikTok, and other video platforms. Creates detailed visual briefs with composition, text, colors, and emotion suggestions. Use when planning thumbnails, improving CTR, or briefing designers.