cointegration-analysis

Cointegration testing for pairs trading using Engle-Granger, Johansen, and rolling stability analysis

7 stars

Best use case

cointegration-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Cointegration testing for pairs trading using Engle-Granger, Johansen, and rolling stability analysis

Teams using cointegration-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/cointegration-analysis/SKILL.md --create-dirs "https://raw.githubusercontent.com/agiprolabs/claude-trading-skills/main/skills/cointegration-analysis/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/cointegration-analysis/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How cointegration-analysis Compares

Feature / Agentcointegration-analysisStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Cointegration testing for pairs trading using Engle-Granger, Johansen, and rolling stability analysis

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Cointegration Analysis

Cointegration testing identifies pairs of assets that share a long-run equilibrium
relationship, enabling statistical arbitrage and pairs trading strategies.

## What Is Cointegration?

Two price series are **cointegrated** when they are individually non-stationary
(random walks) but a linear combination of them is stationary (mean-reverting).
Intuitively, the prices may wander apart temporarily but are pulled back to an
equilibrium spread over time.

### Cointegration vs Correlation

| Property | Correlation | Cointegration |
|---|---|---|
| Measures | Short-term co-movement | Long-run equilibrium |
| Stationarity | Requires stationary returns | Works with non-stationary prices |
| Time horizon | Can change rapidly | Stable over months/years |
| Trading use | Momentum/trend signals | Mean-reversion pairs trades |
| Failure mode | Breaks in regime changes | Breaks on structural shifts |

Two assets can be highly correlated but not cointegrated (e.g., two unrelated
uptrends). Conversely, cointegrated assets may have low short-term correlation
during temporary divergences — which is exactly when pairs trades are entered.

### Why It Matters

- **Pairs trading**: Long the underperformer, short the outperformer, profit on convergence
- **Statistical arbitrage**: Systematic mean-reversion on spread z-scores
- **Spread trading**: Trade the spread directly as a synthetic instrument
- **Risk hedging**: Cointegrated hedge ratios minimize tracking error over time

## Methods

### 1. Engle-Granger Two-Step

The most common approach for two series.

**Step 1** — Regress Y on X using OLS:

```
Y_t = α + β * X_t + ε_t
```

**Step 2** — Test the residuals ε_t for stationarity using the ADF test.

- If residuals are stationary (p < 0.05) → Y and X are cointegrated
- β is the **hedge ratio** for the pairs trade
- α is the long-run mean of the spread

**Important**: Engle-Granger critical values differ from standard ADF critical
values. For n=2 series: 1% = -3.90, 5% = -3.34, 10% = -3.04.

**Asymmetry warning**: Testing Y~X can give a different result than X~Y. Always
test both directions and use the stronger result.

```python
from scipy import stats
import numpy as np
from statsmodels.tsa.stattools import adfuller

# Step 1: OLS regression
slope, intercept, _, _, _ = stats.linregress(x_prices, y_prices)
hedge_ratio = slope

# Step 2: Test residuals
residuals = y_prices - hedge_ratio * x_prices - intercept
adf_stat, p_value, _, _, crit_values, _ = adfuller(residuals, maxlag=None, autolag="AIC")

cointegrated = p_value < 0.05
```

### 2. Johansen Test

Tests multiple series simultaneously and returns the number of cointegrating
relationships. More powerful than Engle-Granger for >2 series.

- Based on a VAR model: ΔY_t = Π·Y_{t-1} + Σ Γ_i·ΔY_{t-i} + ε_t
- Tests the rank of the Π matrix
- Uses trace test and maximum eigenvalue test
- Returns: number of cointegrating vectors and the vectors themselves

```python
from statsmodels.tsa.vector_ar.vecm import coint_johansen

# data: T×N array of price series
result = coint_johansen(data, det_order=0, k_ar_diff=1)

# Trace statistic vs critical values (90%, 95%, 99%)
trace_stats = result.lr1          # Trace statistics
trace_crit = result.cvt           # Critical values
max_eigen_stats = result.lr2      # Max eigenvalue statistics
max_eigen_crit = result.cvm       # Critical values

# Cointegrating vectors
coint_vectors = result.evec
```

### 3. Phillips-Ouliaris

Similar to Engle-Granger but uses Phillips-Perron style test statistics
instead of ADF. More robust to heteroskedasticity and serial correlation in
the residuals. Available via `statsmodels.tsa.stattools.coint`.

```python
from statsmodels.tsa.stattools import coint

# Returns: test statistic, p-value, critical values
t_stat, p_value, crit_values = coint(y_prices, x_prices)
cointegrated = p_value < 0.05
```

## Practical Workflow

### Step 1: Screen Pairs by Correlation

Pre-filter using Pearson correlation > 0.7 to reduce the number of
cointegration tests (which are more expensive).

### Step 2: Test Cointegration

Run Engle-Granger in both directions. Use p < 0.05 threshold.

### Step 3: Estimate Hedge Ratio

Use OLS for simplicity. For production, consider Total Least Squares or
Dynamic OLS (see `references/methodology.md`).

### Step 4: Compute Spread

```python
spread = y_prices - hedge_ratio * x_prices - intercept
z_score = (spread - spread.mean()) / spread.std()
```

### Step 5: Test Spread for Mean Reversion

- **ADF test**: p < 0.05 confirms stationarity
- **Hurst exponent**: H < 0.5 indicates mean reversion (H ≈ 0.5 = random walk)
- **Half-life**: λ from AR(1) on spread; half-life = -ln(2)/ln(λ)
  - Viable pairs: half-life between 5 and 60 days

### Step 6: Trade the Spread

If the spread is mean-reverting, it is a viable pairs trade candidate.
See `references/pairs_trading.md` for entry/exit rules and risk management.

## Rolling Cointegration

Cointegration relationships can break down over time due to structural changes,
regime shifts, or evolving market dynamics.

### Rolling Window Approach

Test cointegration on rolling 60–90 day windows:

```python
window = 60
rolling_pvalues = []
rolling_hedges = []

for i in range(window, len(prices)):
    y_win = y_prices[i - window:i]
    x_win = x_prices[i - window:i]
    _, p_val, _ = coint(y_win, x_win)
    slope, intercept, _, _, _ = stats.linregress(x_win, y_win)
    rolling_pvalues.append(p_val)
    rolling_hedges.append(slope)
```

### Monitoring Signals

| Signal | Healthy | Warning | Stop Trading |
|---|---|---|---|
| Rolling p-value | < 0.05 | 0.05–0.10 | > 0.10 |
| Hedge ratio drift | < 10% change | 10–25% change | > 25% change |
| Spread half-life | 5–60 days | 60–120 days | > 120 days or < 5 |

## Crypto Pairs Candidates

### Layer-1 Correlation
- SOL vs ETH — L1 sector beta, often cointegrated during trending markets
- SOL vs AVAX — alternative L1 correlation

### Stablecoins
- USDC vs USDT — should be perfectly cointegrated (peg arbitrage)
- Useful as a sanity check for your cointegration pipeline

### Liquid Staking Derivatives
- mSOL vs jitoSOL — both track SOL staking yield
- stSOL vs mSOL — Lido vs Marinade staking

### Same-Sector Tokens
- DEX tokens: RAY vs ORCA
- Lending tokens: cross-protocol comparison
- Meme tokens: rarely cointegrated, high risk

## Common Pitfalls

1. **Spurious cointegration** — Two trending series (both up in a bull market) may
   appear cointegrated. Always test on sufficient data (>200 observations) and
   check out-of-sample stability.

2. **Structural breaks** — A fundamental change (protocol upgrade, tokenomics
   change) can permanently break cointegration. Monitor rolling p-values.

3. **Look-ahead bias** — Estimating the hedge ratio on the full sample and then
   backtesting on the same sample inflates results. Always use walk-forward
   estimation.

4. **Too-short sample** — Cointegration tests need >100 observations minimum,
   ideally >200, to have reasonable power.

5. **Ignoring transaction costs** — Pairs trades involve 4 transactions per
   round trip. At 0.3% per leg, that is 1.2% in costs that the spread must
   overcome.

6. **Asymmetric cointegration** — The relationship may only hold in one
   direction or one regime. Consider threshold cointegration models for
   production use.

## Integration with Other Skills

- **`correlation-analysis`** — Pre-screening pairs by correlation before cointegration testing
- **`mean-reversion`** — Trading the cointegrated spread using mean-reversion entry/exit rules
- **`vectorbt`** — Backtesting pairs strategies with walk-forward validation
- **`regime-detection`** — Identifying when cointegration regimes shift
- **`volatility-modeling`** — Spread volatility forecasting for dynamic position sizing

## Files

### References
- `references/methodology.md` — Engle-Granger details, Johansen derivation, hedge ratio estimation methods, spread construction
- `references/pairs_trading.md` — Entry/exit rules, risk management, performance metrics, crypto-specific considerations

### Scripts
- `scripts/test_cointegration.py` — Full cointegration test pipeline with ADF, Hurst, half-life, rolling stability, and demo mode
- `scripts/pairs_backtest.py` — Walk-forward pairs trading backtest with synthetic data and performance reporting

Related Skills

yield-analysis

7
from agiprolabs/claude-trading-skills

DeFi yield evaluation including fee APR, real vs nominal yield, net APY after costs, and yield sustainability analysis

token-holder-analysis

7
from agiprolabs/claude-trading-skills

Token holder distribution, concentration metrics, insider detection, and supply analysis for Solana tokens

sentiment-analysis

7
from agiprolabs/claude-trading-skills

Market sentiment extraction from social media, news, and on-chain data including mention velocity, fear and greed indices, and influencer tracking

mev-analysis

7
from agiprolabs/claude-trading-skills

MEV exposure assessment, sandwich attack detection, and protection strategies for Solana DEX trading

liquidity-analysis

7
from agiprolabs/claude-trading-skills

DEX liquidity depth assessment, slippage estimation, and pool composition analysis for Solana tokens

dex-pool-analysis

7
from agiprolabs/claude-trading-skills

AMM pool mechanics comparison across Raydium, Orca, and Meteora including fee structures, pool types, creation patterns, and volume efficiency

yellowstone-grpc

7
from agiprolabs/claude-trading-skills

Real-time Solana transaction and account streaming via Yellowstone gRPC (Geyser plugin)

whale-tracking

7
from agiprolabs/claude-trading-skills

Large wallet monitoring, accumulation and distribution detection, and smart money signal generation for Solana tokens

wash-sale-detection

7
from agiprolabs/claude-trading-skills

Wash sale detection under 2025 US crypto rules with 61-day window monitoring, disallowed loss tracking, and safe re-entry countdown

wallet-profiling

7
from agiprolabs/claude-trading-skills

Behavioral classification, performance analysis, and trading style detection for Solana wallets

walk-forward-validation

7
from agiprolabs/claude-trading-skills

Walk-forward validation framework for trading strategies and ML models with time-series-aware splits, overfit detection, and regime-aware validation

volatility-modeling

7
from agiprolabs/claude-trading-skills

Volatility estimation, forecasting, and regime classification using GARCH, EWMA, realized volatility, and volatility cones