wildworld-dataset
WildWorld large-scale action-conditioned world modeling dataset with 108M+ frames from a photorealistic ARPG game, featuring per-frame annotations, 450+ actions, and explicit state information for generative world modeling research.
Best use case
wildworld-dataset is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
WildWorld large-scale action-conditioned world modeling dataset with 108M+ frames from a photorealistic ARPG game, featuring per-frame annotations, 450+ actions, and explicit state information for generative world modeling research.
Teams using wildworld-dataset should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/wildworld-dataset/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How wildworld-dataset Compares
| Feature / Agent | wildworld-dataset | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
WildWorld large-scale action-conditioned world modeling dataset with 108M+ frames from a photorealistic ARPG game, featuring per-frame annotations, 450+ actions, and explicit state information for generative world modeling research.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# WildWorld Dataset Skill
> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.
## What WildWorld Is
**WildWorld** is a large-scale action-conditioned world modeling dataset automatically collected from a photorealistic AAA action role-playing game (ARPG). It is designed for training and evaluating **dynamic world models** — generative models that predict future game states given past observations and player actions.
### Key Statistics
| Property | Value |
|---|---|
| Total frames | 108M+ |
| Actions | 450+ semantically meaningful |
| Monster species | 29 |
| Player characters | 4 |
| Weapon types | 4 |
| Distinct stages | 5 |
| Max clip length | 30+ minutes continuous |
### Per-Frame Annotations
Every frame includes:
- **Character skeletons** — joint positions for player and monsters
- **Actions & states** — HP, animation state, stamina, etc.
- **Camera poses** — position, rotation, field of view
- **Depth maps** — monocular depth for each frame
- **Hierarchical captions** — action-level and sample-level natural language descriptions
---
## Project Status
> ⚠️ As of March 2026, the dataset and WildBench benchmark have **not yet been released**. Monitor the repository for updates.
```bash
# Watch the repository for dataset release
# https://github.com/ShandaAI/WildWorld
```
---
## Repository Setup
```bash
# Clone the repository
git clone https://github.com/ShandaAI/WildWorld.git
cd WildWorld
# Install dependencies (when benchmark code is released)
pip install -r requirements.txt
```
---
## Expected Dataset Structure
Based on the paper and framework description, the dataset is expected to follow this structure:
```
WildWorld/
├── data/
│ ├── sequences/
│ │ ├── stage_01/
│ │ │ ├── clip_000001/
│ │ │ │ ├── frames/ # RGB frames (e.g., PNG)
│ │ │ │ ├── depth/ # Depth maps
│ │ │ │ ├── skeleton/ # Per-frame skeleton JSON
│ │ │ │ ├── states/ # HP, animation, stamina JSON
│ │ │ │ ├── camera/ # Camera pose JSON
│ │ │ │ └── actions/ # Action label files
│ │ │ └── clip_000002/
│ │ └── stage_02/
│ └── captions/
│ ├── action_level/ # Per-action descriptions
│ └── sample_level/ # Clip-level descriptions
├── benchmark/
│ └── wildbench/ # WildBench evaluation code
├── assets/
│ └── framework-arxiv.png
├── LICENSE
└── README.md
```
---
## Working with the Dataset (Anticipated API)
### Loading Frame Annotations
```python
import json
import os
from pathlib import Path
from PIL import Image
import numpy as np
class WildWorldClip:
"""Helper class to load a WildWorld clip and its annotations."""
def __init__(self, clip_dir: str):
self.clip_dir = Path(clip_dir)
self.frames_dir = self.clip_dir / "frames"
self.depth_dir = self.clip_dir / "depth"
self.skeleton_dir = self.clip_dir / "skeleton"
self.states_dir = self.clip_dir / "states"
self.camera_dir = self.clip_dir / "camera"
self.actions_dir = self.clip_dir / "actions"
def get_frame(self, frame_id: int) -> Image.Image:
frame_path = self.frames_dir / f"{frame_id:06d}.png"
return Image.open(frame_path)
def get_depth(self, frame_id: int) -> np.ndarray:
depth_path = self.depth_dir / f"{frame_id:06d}.npy"
return np.load(depth_path)
def get_skeleton(self, frame_id: int) -> dict:
skeleton_path = self.skeleton_dir / f"{frame_id:06d}.json"
with open(skeleton_path) as f:
return json.load(f)
def get_state(self, frame_id: int) -> dict:
"""Returns HP, animation state, stamina, etc."""
state_path = self.states_dir / f"{frame_id:06d}.json"
with open(state_path) as f:
return json.load(f)
def get_camera(self, frame_id: int) -> dict:
"""Returns camera position, rotation, and FOV."""
camera_path = self.camera_dir / f"{frame_id:06d}.json"
with open(camera_path) as f:
return json.load(f)
def get_action(self, frame_id: int) -> dict:
action_path = self.actions_dir / f"{frame_id:06d}.json"
with open(action_path) as f:
return json.load(f)
def iter_frames(self, start: int = 0, end: int = None):
"""Iterate over all frames in the clip."""
frame_files = sorted(self.frames_dir.glob("*.png"))
for frame_path in frame_files[start:end]:
frame_id = int(frame_path.stem)
yield {
"frame_id": frame_id,
"frame": self.get_frame(frame_id),
"depth": self.get_depth(frame_id),
"skeleton": self.get_skeleton(frame_id),
"state": self.get_state(frame_id),
"camera": self.get_camera(frame_id),
"action": self.get_action(frame_id),
}
# Usage
clip = WildWorldClip("data/sequences/stage_01/clip_000001")
for sample in clip.iter_frames(start=0, end=100):
frame_id = sample["frame_id"]
state = sample["state"]
action = sample["action"]
print(f"Frame {frame_id}: HP={state.get('hp')}, Action={action.get('name')}")
```
### PyTorch Dataset
```python
import torch
from torch.utils.data import Dataset, DataLoader
from pathlib import Path
import json
import numpy as np
from PIL import Image
import torchvision.transforms as T
class WildWorldDataset(Dataset):
"""
PyTorch Dataset for WildWorld action-conditioned world modeling.
Returns sequences of (frames, actions, states) for next-frame prediction.
"""
def __init__(
self,
root_dir: str,
sequence_length: int = 16,
image_size: tuple = (256, 256),
stage: str = None,
split: str = "train",
):
self.root_dir = Path(root_dir)
self.sequence_length = sequence_length
self.image_size = image_size
self.transform = T.Compose([
T.Resize(image_size),
T.ToTensor(),
T.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])
# Discover all clips
self.clips = self._discover_clips(stage, split)
self.samples = self._build_sample_index()
def _discover_clips(self, stage, split):
clips = []
stage_dirs = (
[self.root_dir / "data" / "sequences" / stage]
if stage
else sorted((self.root_dir / "data" / "sequences").iterdir())
)
for stage_dir in stage_dirs:
if stage_dir.is_dir():
for clip_dir in sorted(stage_dir.iterdir()):
if clip_dir.is_dir():
clips.append(clip_dir)
# Simple train/val split
split_idx = int(len(clips) * 0.9)
return clips[:split_idx] if split == "train" else clips[split_idx:]
def _build_sample_index(self):
"""Build index of (clip_dir, start_frame) pairs."""
samples = []
for clip_dir in self.clips:
frames = sorted((clip_dir / "frames").glob("*.png"))
n_frames = len(frames)
for start in range(0, n_frames - self.sequence_length, self.sequence_length // 2):
samples.append((clip_dir, start))
return samples
def __len__(self):
return len(self.samples)
def __getitem__(self, idx):
clip_dir, start = self.samples[idx]
frames_dir = clip_dir / "frames"
frame_files = sorted(frames_dir.glob("*.png"))[start:start + self.sequence_length]
frames, actions, states = [], [], []
for frame_path in frame_files:
frame_id = int(frame_path.stem)
# Load RGB frame
img = Image.open(frame_path).convert("RGB")
frames.append(self.transform(img))
# Load action
action_path = clip_dir / "actions" / f"{frame_id:06d}.json"
with open(action_path) as f:
action_data = json.load(f)
actions.append(action_data.get("action_id", 0))
# Load state
state_path = clip_dir / "states" / f"{frame_id:06d}.json"
with open(state_path) as f:
state_data = json.load(f)
states.append([
state_data.get("hp", 1.0),
state_data.get("stamina", 1.0),
state_data.get("animation_id", 0),
])
return {
"frames": torch.stack(frames), # (T, C, H, W)
"actions": torch.tensor(actions, dtype=torch.long), # (T,)
"states": torch.tensor(states, dtype=torch.float32), # (T, S)
}
# Usage
dataset = WildWorldDataset(
root_dir="/path/to/WildWorld",
sequence_length=16,
image_size=(256, 256),
split="train",
)
loader = DataLoader(dataset, batch_size=4, shuffle=True, num_workers=4)
for batch in loader:
frames = batch["frames"] # (B, T, C, H, W)
actions = batch["actions"] # (B, T)
states = batch["states"] # (B, T, S)
print(f"Frames: {frames.shape}, Actions: {actions.shape}")
break
```
### Filtering by Action Type
```python
# Action categories in WildWorld
ACTION_CATEGORIES = {
"movement": ["walk", "run", "sprint", "dodge", "jump"],
"attack": ["light_attack", "heavy_attack", "combo_finisher"],
"skill": ["skill_cast_1", "skill_cast_2", "skill_cast_3", "skill_cast_4"],
"defense": ["block", "parry", "guard"],
"idle": ["idle", "idle_combat"],
}
def filter_clips_by_action(dataset_root: str, action_category: str) -> list:
"""Find all frame indices that contain a specific action category."""
root = Path(dataset_root)
results = []
target_actions = ACTION_CATEGORIES.get(action_category, [])
for clip_dir in root.glob("data/sequences/**"):
if not clip_dir.is_dir():
continue
for action_file in sorted((clip_dir / "actions").glob("*.json")):
with open(action_file) as f:
data = json.load(f)
if data.get("action_name") in target_actions:
results.append({
"clip": str(clip_dir),
"frame_id": int(action_file.stem),
"action": data.get("action_name"),
})
return results
# Find all skill cast frames
skill_frames = filter_clips_by_action("/path/to/WildWorld", "skill")
print(f"Found {len(skill_frames)} skill cast frames")
```
---
## WildBench Evaluation
```python
# WildBench evaluates world models on next-frame prediction quality.
# Expected metrics: FVD, PSNR, SSIM, action accuracy
class WildBenchEvaluator:
"""Evaluator for world model predictions on WildBench."""
def __init__(self, benchmark_dir: str):
self.benchmark_dir = Path(benchmark_dir)
self.metrics = {}
def evaluate(self, model, dataloader):
from torchmetrics.image import StructuralSimilarityIndexMeasure, PeakSignalNoiseRatio
ssim = StructuralSimilarityIndexMeasure()
psnr = PeakSignalNoiseRatio()
all_psnr, all_ssim = [], []
for batch in dataloader:
frames = batch["frames"] # (B, T, C, H, W)
actions = batch["actions"] # (B, T)
states = batch["states"] # (B, T, S)
# Use first T-1 frames to predict the T-th frame
context_frames = frames[:, :-1]
context_actions = actions[:, :-1]
target_frame = frames[:, -1]
with torch.no_grad():
predicted_frame = model(context_frames, context_actions, states[:, :-1])
all_psnr.append(psnr(predicted_frame, target_frame).item())
all_ssim.append(ssim(predicted_frame, target_frame).item())
return {
"PSNR": np.mean(all_psnr),
"SSIM": np.mean(all_ssim),
}
```
---
## Citation
```bibtex
@misc{li2026wildworldlargescaledatasetdynamic,
title={WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG},
author={Zhen Li and Zian Meng and Shuwei Shi and Wenshuo Peng and Yuwei Wu and Bo Zheng and Chuanhao Li and Kaipeng Zhang},
year={2026},
eprint={2603.23497},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.23497},
}
```
---
## Resources
- **Project Page**: https://shandaai.github.io/wildworld-project/
- **arXiv Paper**: https://arxiv.org/abs/2603.23497
- **YouTube Demo**: https://www.youtube.com/watch?v=9vcSg553r2g
- **GitHub**: https://github.com/ShandaAI/WildWorld
---
## Troubleshooting
| Issue | Solution |
|---|---|
| Dataset not yet available | Monitor the repo; dataset release is pending as of March 2026 |
| Frame loading OOM | Reduce `sequence_length` or `image_size` in the Dataset |
| Missing annotation files | Check that all subdirs (frames, depth, skeleton, states, camera, actions) are fully downloaded |
| Slow DataLoader | Increase `num_workers`, use SSD storage, or preprocess to HDF5 |
| Benchmark code not found | The `benchmark/wildbench` directory will be released separately — watch the repo |Related Skills
```markdown
---
zeroboot-vm-sandbox
Sub-millisecond VM sandboxes for AI agents using copy-on-write KVM forking via Zeroboot
yourvpndead-vpn-detection
Android app that detects VPN/proxy servers (VLESS/xray/sing-box) via local SOCKS5 vulnerability, exposing exit IPs and server configs without root
xata-postgres-platform
Expert skill for Xata open-source cloud-native Postgres platform with copy-on-write branching, scale-to-zero, and Kubernetes deployment
x-mentor-skill-nuwa
AI-powered X (Twitter) content strategy skill that distills methodologies from 6 top creators + open-source algorithm data into actionable writing, growth, and monetization guidance.
wx-favorites-report
End-to-end pipeline to extract, decrypt, and visualize WeChat Mac favorites from encrypted SQLite DB into an interactive HTML report.
wterm-web-terminal
Web terminal emulator with Zig/WASM core, DOM rendering, and React/vanilla JS bindings
worldmonitor-intelligence-dashboard
Real-time global intelligence dashboard with AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking
witr-process-inspector
CLI and TUI tool that explains why processes, services, and ports are running by tracing causality chains across supervisors, containers, and shells.
whatcable-macos-usb-inspector
macOS menu bar app that identifies USB-C cable capabilities and charging diagnostics using IOKit
wewrite-wechat-ai-publishing
Full-pipeline AI skill for WeChat Official Account articles — hotspot fetching, topic selection, writing, SEO, image generation, formatting, and draft box publishing.
weixin-agent-sdk
Bridge any AI agent backend to WeChat using the weixin-agent-sdk framework with simple Agent interface, login, and message loop.