entropy-sim2real
Entropy-driven sim2real transfer. Uses maximum entropy RL, domain randomization, and information-theoretic bridging to close the reality gap.
Best use case
entropy-sim2real is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Entropy-driven sim2real transfer. Uses maximum entropy RL, domain randomization, and information-theoretic bridging to close the reality gap.
Teams using entropy-sim2real should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/entropy-sim2real/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How entropy-sim2real Compares
| Feature / Agent | entropy-sim2real | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Entropy-driven sim2real transfer. Uses maximum entropy RL, domain randomization, and information-theoretic bridging to close the reality gap.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Entropy-Driven Sim2Real Transfer
**Trit**: -1 (MINUS - analysis/verification)
**Color**: #E85B8E (Rose Pink)
**URI**: skill://entropy-sim2real#E85B8E
## Core Insight
**Entropy bridges the sim-real gap by:**
1. **Maximizing entropy in simulation** → Policy sees diverse conditions
2. **Minimizing entropy at deployment** → Uncertainty collapses to reality
3. **Information-theoretic alignment** → Match distributions, not parameters
```
SIMULATION REALITY
High Entropy ─────────────────────────────▶ Low Entropy
H(params) = max ══════════▶ H(params) ≈ 0
H(π|s) = high ══════════▶ H(π|s) = focused
p(sim) = broad ══════════▶ p(real) = delta
┌─────────────────┐ ┌─────────────────┐
│ MANY POSSIBLE │ BRIDGE │ ONE ACTUAL │
│ WORLDS │───────────────│ WORLD │
│ (superpos.) │ │ (collapsed) │
└─────────────────┘ └─────────────────┘
```
## Three Entropy Mechanisms
### 1. Domain Randomization Entropy
Maximize entropy over simulation parameters:
```python
import jax
import jax.numpy as jnp
from typing import Dict
class EntropyMaximizingRandomizer:
"""Domain randomization that maximizes parameter entropy."""
def __init__(self, param_ranges: Dict[str, tuple]):
self.param_ranges = param_ranges
def entropy(self, distribution: str = "uniform") -> float:
"""Compute entropy of parameter distributions."""
H = 0.0
for name, (low, high) in self.param_ranges.items():
if distribution == "uniform":
# H(Uniform) = log(b - a)
H += jnp.log(high - low)
elif distribution == "gaussian":
# H(Gaussian) = 0.5 * log(2πeσ²)
sigma = (high - low) / 4 # 95% within range
H += 0.5 * jnp.log(2 * jnp.pi * jnp.e * sigma**2)
return H
def sample(self, key: jax.random.PRNGKey) -> Dict[str, float]:
"""Sample parameters to maximize coverage."""
params = {}
for i, (name, (low, high)) in enumerate(self.param_ranges.items()):
k = jax.random.fold_in(key, i)
# Uniform maximizes entropy for bounded support
params[name] = jax.random.uniform(k, minval=low, maxval=high)
return params
def adaptive_entropy(
self,
key: jax.random.PRNGKey,
real_samples: jnp.ndarray,
temperature: float = 1.0
) -> Dict[str, float]:
"""
Adapt randomization to maximize coverage of real distribution.
Uses maximum entropy principle: find distribution with highest
entropy subject to matching observed moments.
"""
# Estimate real distribution moments
real_mean = jnp.mean(real_samples, axis=0)
real_var = jnp.var(real_samples, axis=0)
# Maximum entropy distribution matching moments = Gaussian
params = {}
for i, (name, _) in enumerate(self.param_ranges.items()):
k = jax.random.fold_in(key, i)
# Sample from Gaussian matching real moments (max entropy)
params[name] = jax.random.normal(k) * jnp.sqrt(real_var[i]) + real_mean[i]
return params
```
### 2. Maximum Entropy RL
Policy optimization with entropy regularization:
```python
class MaxEntropyPPO:
"""
PPO with entropy bonus for robust sim2real.
Objective: max E[Σ γᵗ(rₜ + α·H(π(·|sₜ)))]
High entropy → diverse actions → robust to perturbations
"""
def __init__(
self,
entropy_coef: float = 0.01,
target_entropy: float = -1.0,
auto_tune: bool = True
):
self.alpha = entropy_coef
self.target_entropy = target_entropy
self.auto_tune = auto_tune
if auto_tune:
# Learnable temperature (SAC-style)
self.log_alpha = jnp.log(entropy_coef)
def policy_entropy(self, logits: jnp.ndarray) -> float:
"""Compute policy entropy H(π) = -Σ π(a)log(π(a))."""
probs = jax.nn.softmax(logits)
log_probs = jax.nn.log_softmax(logits)
return -jnp.sum(probs * log_probs, axis=-1).mean()
def gaussian_entropy(self, std: jnp.ndarray) -> float:
"""Entropy of Gaussian policy: H = 0.5 * log(2πeσ²)."""
return 0.5 * jnp.log(2 * jnp.pi * jnp.e * std**2).sum(axis=-1).mean()
def entropy_loss(
self,
policy_entropy: float,
update_alpha: bool = True
) -> tuple:
"""
Compute entropy loss and optionally update temperature.
We want: H(π) ≥ H_target
Loss: α * (H(π) - H_target)
"""
entropy_bonus = self.alpha * policy_entropy
if self.auto_tune and update_alpha:
# Dual gradient descent on temperature
alpha_loss = -self.log_alpha * (policy_entropy - self.target_entropy)
return entropy_bonus, alpha_loss
return entropy_bonus, 0.0
def robust_policy_loss(
self,
advantages: jnp.ndarray,
log_probs: jnp.ndarray,
old_log_probs: jnp.ndarray,
policy_entropy: float,
clip_ratio: float = 0.2
) -> float:
"""
PPO loss with entropy regularization.
L = L_clip + α·H(π)
High entropy prevents overconfident policies that
fail on real hardware.
"""
# Standard PPO clipped objective
ratio = jnp.exp(log_probs - old_log_probs)
clipped = jnp.clip(ratio, 1 - clip_ratio, 1 + clip_ratio)
policy_loss = -jnp.minimum(ratio * advantages, clipped * advantages).mean()
# Entropy bonus (negative because we minimize loss)
entropy_bonus = -self.alpha * policy_entropy
return policy_loss + entropy_bonus
```
### 3. Information-Theoretic Bridging
Minimize information gap between sim and real:
```python
class InformationTheoreticBridge:
"""
Bridge sim and real via information-theoretic measures.
Key insight: We can't match physics exactly, but we can
match the *information content* of observations.
"""
def mutual_information(
self,
sim_obs: jnp.ndarray,
real_obs: jnp.ndarray
) -> float:
"""
Estimate I(sim; real) - how much sim tells us about real.
High MI = sim is predictive of real (good!)
Low MI = sim and real are independent (bad!)
"""
# Use MINE estimator or simple correlation
joint_cov = jnp.cov(sim_obs.T, real_obs.T)
n = sim_obs.shape[1]
cov_sim = joint_cov[:n, :n]
cov_real = joint_cov[n:, n:]
cov_joint = joint_cov
# MI = 0.5 * log(|Σ_sim||Σ_real| / |Σ_joint|)
mi = 0.5 * (
jnp.linalg.slogdet(cov_sim)[1] +
jnp.linalg.slogdet(cov_real)[1] -
jnp.linalg.slogdet(cov_joint)[1]
)
return mi
def domain_divergence(
self,
sim_obs: jnp.ndarray,
real_obs: jnp.ndarray,
method: str = "wasserstein"
) -> float:
"""
Measure divergence between sim and real distributions.
Lower divergence = better sim2real transfer.
"""
if method == "kl":
# KL(real || sim) - how surprised is sim by real?
# Requires density estimation
pass
elif method == "wasserstein":
# W_2 distance (optimal transport)
mu_sim = jnp.mean(sim_obs, axis=0)
mu_real = jnp.mean(real_obs, axis=0)
cov_sim = jnp.cov(sim_obs.T)
cov_real = jnp.cov(real_obs.T)
# W_2² = ||μ_sim - μ_real||² + Tr(Σ_sim + Σ_real - 2(Σ_sim^½ Σ_real Σ_sim^½)^½)
mean_diff = jnp.sum((mu_sim - mu_real)**2)
# Simplified: use Frobenius norm of covariance difference
cov_diff = jnp.sum((cov_sim - cov_real)**2)
return jnp.sqrt(mean_diff + cov_diff)
elif method == "mmd":
# Maximum Mean Discrepancy
from functools import partial
def rbf_kernel(x, y, sigma=1.0):
return jnp.exp(-jnp.sum((x - y)**2) / (2 * sigma**2))
n, m = len(sim_obs), len(real_obs)
# MMD² = E[k(x,x')] + E[k(y,y')] - 2E[k(x,y)]
xx = jnp.mean(jax.vmap(lambda x: jax.vmap(lambda x2: rbf_kernel(x, x2))(sim_obs))(sim_obs))
yy = jnp.mean(jax.vmap(lambda y: jax.vmap(lambda y2: rbf_kernel(y, y2))(real_obs))(real_obs))
xy = jnp.mean(jax.vmap(lambda x: jax.vmap(lambda y: rbf_kernel(x, y))(real_obs))(sim_obs))
return xx + yy - 2 * xy
def entropy_matching_loss(
self,
sim_obs: jnp.ndarray,
real_obs: jnp.ndarray
) -> float:
"""
Match entropy profiles between sim and real.
If H(sim) >> H(real): sim too noisy, reduce randomization
If H(sim) << H(real): sim too deterministic, increase randomization
"""
def estimate_entropy(obs):
# Estimate via covariance determinant (Gaussian assumption)
cov = jnp.cov(obs.T)
return 0.5 * jnp.linalg.slogdet(cov)[1]
H_sim = estimate_entropy(sim_obs)
H_real = estimate_entropy(real_obs)
return (H_sim - H_real)**2
```
## The Entropy Bridge Pipeline
```
┌────────────────────────────────────────────────────────────────────┐
│ ENTROPY-DRIVEN SIM2REAL │
├────────────────────────────────────────────────────────────────────┤
│ │
│ PHASE 1: Maximum Entropy Simulation │
│ ──────────────────────────────────── │
│ │
│ Domain Params Policy Observations │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ H(θ) = max │ ───▶ │ H(π|s) = αT │ ───▶ │ H(o) = high │ │
│ │ friction ∈ │ │ explore all │ │ diverse │ │
│ │ [0.3, 1.5] │ │ actions │ │ experiences │ │
│ │ mass ∈ │ └─────────────┘ └─────────────┘ │
│ │ [0.8, 1.2] │ │
│ └─────────────┘ │
│ │
│ PHASE 2: Information Bridge │
│ ─────────────────────────── │
│ │
│ Sim Distribution Divergence Real Distribution │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ p(o|sim) │ ──────▶│ W(sim,real) │◀─── │ p(o|real) │ │
│ │ (broad) │ │ minimize │ │ (narrow) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │
│ Adapt randomization │
│ to match real entropy │
│ │
│ PHASE 3: Entropy Collapse at Deployment │
│ ──────────────────────────────────────── │
│ │
│ Policy trained on Deployed on Result │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ ALL possible│ ───▶ │ ONE actual │ ───▶ │ ROBUST to │ │
│ │ worlds │ │ world │ │ any world │ │
│ │ (superpos.) │ │ (collapsed) │ │ in support │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────┘
```
## Integration with K-Scale Stack
```python
from ksim import PPOTask, PhysicsRandomizer
from ksim.randomizers import (
StaticFrictionRandomizer,
MassMultiplicationRandomizer,
JointDampingRandomizer,
)
class EntropyBridgedKBotTask(PPOTask):
"""K-Bot training with entropy-driven sim2real."""
# High-entropy domain randomization
physics_randomizers = [
StaticFrictionRandomizer(scale=0.5), # Wide friction range
MassMultiplicationRandomizer( # Body mass variation
body_name="torso",
scale=0.2
),
JointDampingRandomizer(scale=0.3), # Damping variation
# ... more randomizers for max entropy
]
# Max-entropy RL config
entropy_coef = 0.02 # High entropy bonus
target_entropy = -4.0 # Automatic temperature tuning
def compute_entropy_metrics(self, trajectory):
"""Track entropy throughout training."""
policy_entropy = self.policy.entropy(trajectory.obs)
obs_entropy = self.estimate_obs_entropy(trajectory.obs)
return {
"policy_entropy": policy_entropy,
"observation_entropy": obs_entropy,
"entropy_ratio": policy_entropy / obs_entropy,
}
def adapt_randomization(self, real_data):
"""
Adapt domain randomization to match real robot entropy.
This is the key insight: we don't try to match exact
parameters, we match the *entropy profile*.
"""
sim_obs = self.collect_sim_observations()
real_obs = real_data.observations
# Compute entropy gap
H_sim = self.estimate_entropy(sim_obs)
H_real = self.estimate_entropy(real_obs)
if H_sim > H_real * 1.5:
# Sim too noisy, reduce randomization
self.reduce_randomization_scale(0.9)
elif H_sim < H_real * 0.7:
# Sim too deterministic, increase randomization
self.increase_randomization_scale(1.1)
# Match distribution via Wasserstein
W = self.wasserstein_distance(sim_obs, real_obs)
self.log("wasserstein_distance", W)
```
## Why Entropy Works for Sim2Real
### 1. Coverage Guarantee
```
If policy π is optimal for ALL sims in support of p(sim),
and real world ∈ support of p(sim),
then π works in real world.
Key: Entropy maximization → widest possible support
```
### 2. Robustness via Exploration
```
High H(π|s) → policy doesn't overfit to single solution
→ maintains multiple viable strategies
→ can adapt when reality differs
```
### 3. Information Bottleneck
```
Sim and real share mutual information I(sim; real)
Maximize I → sim captures what matters about real
Ignore I → overfit to sim-specific artifacts
```
## GF(3) Triads
```
entropy-sim2real (-1) ⊗ kos-firmware (+1) ⊗ mujoco-scenes (0) = 0 ✓
entropy-sim2real (-1) ⊗ jaxlife-open-ended (+1) ⊗ wobble-dynamics (0) = 0 ✓
ksim-rl (-1) ⊗ kos-firmware (+1) ⊗ entropy-sim2real (-1) = needs +1
```
## Related Skills
- `ksim-rl` (-1): Base RL training
- `kos-firmware` (+1): Deployment target
- `ergodicity` (0): Ergodic theory foundations
- `birkhoff-average` (-1): Time averages
- `fokker-planck-analyzer` (-1): Distribution dynamics
## References
```bibtex
@article{haarnoja2018sac,
title={Soft Actor-Critic: Off-Policy Maximum Entropy Deep RL},
author={Haarnoja, Tuomas and others},
journal={ICML},
year={2018}
}
@article{tobin2017domain,
title={Domain Randomization for Transferring Deep Neural Networks},
author={Tobin, Josh and others},
journal={IROS},
year={2017}
}
@article{zhao2020sim,
title={Sim-to-Real Transfer in Deep Reinforcement Learning},
author={Zhao, Wenshuai and others},
journal={IEEE TNNLS},
year={2020}
}
```Related Skills
entropy-sequencer
Layer 5 Interaction Interleaving for Maximum Information Gain with DuckDB
zx-calculus
Coecke's ZX-calculus for quantum circuit reasoning via string diagrams with Z-spiders (green) and X-spiders (red)
zulip-cogen
Zulip Cogen Skill 🐸⚡
zls-integration
zls-integration skill
zig
zig skill
zig-syrup-bci
Multimodal BCI pipeline in Zig: DSI-24 EEG, fNIRS mBLL, eye tracking IVT, LSL sync, EDF read/write, GF(3) conservation
zig-programming
zig-programming skill
zeroth-bot
Zeroth Bot - 3D-printed open-source humanoid robot platform for sim-to-real and RL research. Affordable entry point for humanoid robotics.
xlsx
Comprehensive spreadsheet creation, editing, and analysis with support
wycheproof
Google's Wycheproof test vectors for cryptographic implementation testing.
Writing Hookify Rules
This skill should be used when the user asks to "create a hookify rule", "write a hook rule", "configure hookify", "add a hookify rule", or needs guidance on hookify rule syntax and patterns.
worldmat-tidar
worldmat-tidar