account-aware-training
Add account state (P&L, win rate, drawdown) to RL observations + drawdown penalty in rewards. Trigger when: (1) model needs account awareness, (2) training should penalize drawdowns, (3) upgrading obs_dim 5300→5600.
Best use case
account-aware-training is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Add account state (P&L, win rate, drawdown) to RL observations + drawdown penalty in rewards. Trigger when: (1) model needs account awareness, (2) training should penalize drawdowns, (3) upgrading obs_dim 5300→5600.
Teams using account-aware-training should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/account-aware-training/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How account-aware-training Compares
| Feature / Agent | account-aware-training | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Add account state (P&L, win rate, drawdown) to RL observations + drawdown penalty in rewards. Trigger when: (1) model needs account awareness, (2) training should penalize drawdowns, (3) upgrading obs_dim 5300→5600.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Account-Aware RL Training (v2.4)
## Experiment Overview
| Item | Details |
|------|---------|
| **Date** | 2024-12-26 |
| **Goal** | Make RL model learn from account state (P&L, win rate, drawdown) |
| **Environment** | vectorized_env.py, inference_obs_builder.py, training notebook |
| **Status** | Success |
## Context
Prior to v2.4, the RL model was "blind" to account performance. It received:
- 53 features: price action, technicals, regime probabilities, calendar effects
- No information about cumulative P&L, win rate, or drawdown
**Problem**: The model could generate signals that were individually good but led to excessive drawdowns at the account level. It had no incentive to trade conservatively after losses.
**Solution**: Add 3 account-level features + drawdown penalty in rewards.
## Verified Workflow
### 1. Config Parameters (GPUEnvConfig)
```python
# In vectorized_env.py GPUEnvConfig dataclass (~line 405)
# Account-aware training (v2.4)
drawdown_penalty_threshold: float = 0.15 # Penalize when drawdown > 15%
drawdown_penalty_weight: float = 0.10 # Weight in reward function
```
### 2. Equity Tracking Tensors
```python
# In _init_state_tensors() after line 712
# Account-level equity tracking (v2.4)
self.initial_equity = torch.ones(self.n_envs, dtype=self.dtype, device=self.device)
self.peak_equity = torch.ones(self.n_envs, dtype=self.dtype, device=self.device)
self.current_equity = torch.ones(self.n_envs, dtype=self.dtype, device=self.device)
```
### 3. Reset Equity Tensors
```python
# In reset() after line 850
# Reset account-level equity tracking
self.initial_equity[env_ids] = 1.0
self.peak_equity[env_ids] = 1.0
self.current_equity[env_ids] = 1.0
```
### 4. Update Equity in step()
```python
# In step() after line 926
# Update account-level equity tracking (v2.4)
self.current_equity = self.initial_equity + self.total_pnl / (current_prices + 1e-8)
self.peak_equity = torch.maximum(self.peak_equity, self.current_equity)
```
### 5. Feature Count Update
```python
# In _calculate_obs_features() line 682
# Add account features
account = 3 # total_pnl_pct, rolling_win_rate, current_drawdown_pct
return base + technical + intraday + temporal + markov + extended + multi_window + account
# Result: 53 + 3 = 56 features
```
### 6. Account Features in Observations
```python
# In _get_observations() after line 1258, before sanitization
# === ACCOUNT-LEVEL FEATURES (3) - v2.4 ===
# Feature 1: Total P&L % (normalized to [-1, 1])
total_pnl_pct = self.total_pnl / (self.initial_equity + 1e-8)
total_pnl_pct_norm = torch.tanh(total_pnl_pct * 10)
obs[:, :, feat_idx] = total_pnl_pct_norm[env_ids].unsqueeze(1).expand(-1, self.config.window)
feat_idx += 1
# Feature 2: Rolling win rate (0.5 if no trades)
win_rate = torch.where(
self.n_trades[env_ids] > 0,
self.n_wins[env_ids].float() / self.n_trades[env_ids].float(),
torch.full((n_envs,), 0.5, dtype=self.dtype, device=self.device)
)
obs[:, :, feat_idx] = win_rate.unsqueeze(1).expand(-1, self.config.window)
feat_idx += 1
# Feature 3: Current drawdown % [0, 1]
drawdown = (self.peak_equity[env_ids] - self.current_equity[env_ids]) / (self.peak_equity[env_ids] + 1e-8)
drawdown = torch.clamp(drawdown, 0.0, 1.0)
obs[:, :, feat_idx] = drawdown.unsqueeze(1).expand(-1, self.config.window)
feat_idx += 1
```
### 7. Drawdown Penalty in Rewards
```python
# In _calculate_rewards() after line 1618
# COMPONENT 7: Drawdown penalty (v2.4)
current_drawdown = (self.peak_equity - self.current_equity) / (self.peak_equity + 1e-8)
current_drawdown = torch.clamp(current_drawdown, 0.0, 1.0)
# Quadratic penalty when over threshold
drawdown_over_threshold = torch.clamp(current_drawdown - self.config.drawdown_penalty_threshold, min=0.0)
drawdown_penalty = -drawdown_over_threshold ** 2 * 10
# Add to reward combination:
reward = (
self.config.direction_weight * direction_reward +
self.config.magnitude_weight * magnitude_reward +
self.config.pnl_weight * pnl_reward +
self.config.stop_tp_weight * stop_tp_reward +
self.config.exploration_weight * exploration_bonus +
self.config.slippage_weight * slippage_penalty +
self.config.drawdown_penalty_weight * drawdown_penalty # NEW
) * risk_adjustment
```
### 8. Inference Observation Builder
```python
# In inference_obs_builder.py get_target_features_from_obs_dim()
if features == 56:
return 56 # v2.4 with account awareness
elif features == 53:
return 53 # v2.3
# ... legacy support
# In build_inference_observation() after line 624
# === ACCOUNT-LEVEL FEATURES (3) - v2.4 ===
# Use neutral defaults during inference
if target_features >= 56:
obs[:, feat_idx] = 0.0 # total_pnl_pct (no prior trades)
feat_idx += 1
obs[:, feat_idx] = 0.5 # win_rate (neutral prior)
feat_idx += 1
obs[:, feat_idx] = 0.0 # drawdown (no drawdown)
feat_idx += 1
```
## Failed Attempts (Critical)
| Attempt | Why it Failed | Lesson Learned |
|---------|---------------|----------------|
| Account features with raw P&L values | P&L scale varies by price level | Use P&L percentage normalized with tanh |
| Win rate = 0 when no trades | Invalid input during initial episodes | Default to 0.5 (neutral prior) |
| Peak equity never decreasing | Logical error in update | Use torch.maximum() to track high-water mark |
| Drawdown penalty linear | Too harsh at moderate levels | Quadratic scaling is gentler below threshold |
| Live inference with account state | Would need real account connection | Use neutral defaults (0, 0.5, 0) for inference |
## Final Parameters
```yaml
# GPUEnvConfig (v2.4)
n_features: 56 # Was 53 in v2.3
drawdown_penalty_threshold: 0.15 # 15% drawdown starts penalty
drawdown_penalty_weight: 0.10 # Moderate weight in reward
# Feature breakdown (56 total)
base_features: 7 # price action basics
technical_features: 4 # intraday technicals
temporal_features: 7 # calendar features
markov_features: 12 # 4-chain regime probabilities
extended_features: 14 # extended technicals
multi_window_features: 9 # 20/50/100 bar windows
account_features: 3 # P&L %, win rate, drawdown %
# obs_dim = n_features * window = 56 * 100 = 5600
```
## Key Insights
- **Breaking Change**: obs_dim 5300 → 5600 means v2.3 models CANNOT be used with v2.4 environments
- **Neutral Inference**: Live trading uses neutral defaults (0, 0.5, 0) since account state isn't tracked per-prediction
- **Quadratic Penalty**: The `** 2` makes penalty gentle at 16% drawdown but harsh at 25%+
- **Normalized P&L**: `tanh(pnl * 10)` keeps values in [-1, 1] even for large P&L swings
- **0.5 Win Rate Prior**: Prevents model confusion during initial trades with no history
## Model Behavior Expected
With account awareness, the model should learn:
1. **Reduce position sizing after losses** (sees drawdown feature)
2. **Be more selective after poor win rate** (sees win rate feature)
3. **Avoid compounding losses** (drawdown penalty kicks in at 15%)
4. **Trade more aggressively when profitable** (sees positive P&L)
## References
- `alpaca_trading/gpu/vectorized_env.py`: Lines 405 (config), 712 (tensors), 850 (reset), 926 (step), 1258 (obs)
- `alpaca_trading/gpu/inference_obs_builder.py`: Lines 61-108 (feature detection), 624+ (account features)
- `notebooks/VSCode_Colab_Training_NATIVE.ipynb`: Training notebook with v2.4 settingsRelated Skills
adr-aware
---
adhd-accountability
ADHD-optimized accountability for task tracking, abandonment detection, and interventions. Use when tracking tasks, detecting context switches, or providing accountability support.
plaid-accounts-expert
Expert on Plaid accounts and account management. Covers account data retrieval, balance checking, account types, multi-account handling, and account webhooks. Invoke when user mentions Plaid accounts, account balance, account types, or account management.
Accounts Reconciler
Automate account reconciliation by matching transactions, identifying discrepancies, and generating variance reports
accounts-payable-workflow
Эксперт AP workflow. Используй для процессов кредиторской задолженности, invoice processing, three-way matching и payment automation.
accountant-expert
Expert-level accounting, tax, financial reporting, and accounting systems
account-tiering
Use when defining ABM tiers, scoring logic, and coverage rules.
account-security
Account security - MFA, sessions, recovery. Use when protecting user accounts.
account-security-validation
Validate account security and authentication protocols.
account-qualification
Qualifies and tiers accounts based on signals, fit, and potential. Use this skill when building target lists, prioritizing accounts, identifying high-potential prospects, or defining ideal customer profile criteria.
account-plan
Create or update strategic account plan
account-onboarding
Онбординг нового рекламного аккаунта. Создаёт конфигурацию для оптимизации.