experiment-runner-run

Run survival arena experiments

33 stars

Best use case

experiment-runner-run is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Run survival arena experiments

Teams using experiment-runner-run should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/experiment-runner-run/SKILL.md --create-dirs "https://raw.githubusercontent.com/aAAaqwq/AGI-Super-Team/main/skills/experiment-runner-run/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/experiment-runner-run/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How experiment-runner-run Compares

Feature / Agentexperiment-runner-runStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Run survival arena experiments

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Experiment Runner (proj-012)

> Adaptive experiment runner for survival arena. Runs experiments one by one, analyzes results, commits. Between experiments -- human/AI decides what to change next.

## Workflow (adaptive)

```
1. --status        → view what's been done
2. --next          → run the next pending experiment
3. Analysis        → view analysis.json, understand the result
4. Decision        → what to change? (only environment conditions, not behavior)
5. Edit YAML       → modify the next experiment or add a new one
6. Repeat from #2
```

**Rule:** we only change conditions (pressure, resources, architecture). Never hardcode agent behavior.

## When to use

- "run experiment" / "next experiment"
- "what is the experiment status"
- "run EXP-011c"
- "run the next 2 experiments"
- proj-012 experiment pipeline

## Dependencies

- Python 3, PyYAML (`pip install pyyaml`)
- Claude CLI (for LLM queries in the arena)
- Git (for committing results)

## Paths

| What | Path |
|------|------|
| Arena | `$AGENTS_PATH/survival-arena/arena.py` |
| Orchestrator | `$AGENTS_PATH/survival-arena/run_experiments.py` |
| Plan (YAML) | `$AGENTS_PATH/survival-arena/experiments.yaml` |
| Results | `$AGENTS_PATH/survival-arena/experiment_results.json` |
| Logs | `$AGENTS_PATH/survival-arena/logs/experiments/` |
| Documentation | `$PROJECT_ROOT/projects/docs/proj-012-agi-consciousness/` |

## How to execute

### Next experiment (main mode)

```bash
cd $AGENTS_PATH/survival-arena
python3 run_experiments.py --next
```

### Next N experiments

```bash
python3 run_experiments.py --next 2
```

### View status

```bash
python3 run_experiments.py --status
```

### Run a specific experiment

```bash
python3 run_experiments.py --experiment EXP-011c
```

### Run all pending (batch mode)

```bash
python3 run_experiments.py --resume
```

### View commands without running

```bash
python3 run_experiments.py --dry-run --next 3
```

### Check arena config

```bash
python3 arena.py --config-dump --upkeep-base 0 --architecture single
```

## arena.py parameters

| Parameter | Description | Default |
|-----------|-------------|---------|
| `--upkeep-base N` | Pressure: maintenance cost per turn | 2 |
| `--regen-rate N` | Resource regeneration rate per turn | 3 |
| `--num-nodes N` | Number of resource nodes (distributed across clusters) | 12 |
| `--child-ratio F` | Child token share | 0.35 |
| `--repro-threshold N` | Reproduction threshold | 120 |
| `--repro-cost N` | Reproduction cost | 70 |
| `--architecture TYPE` | single / dual-same / dual-split / dual-kahneman | dual-kahneman |
| `--experiment-id ID` | Identifier for logs | - |
| `--config-dump` | Show config as JSON and exit | - |
| `--model MODEL` | haiku / sonnet | sonnet |
| `--turns N` | Number of turns | 50 |
| `--seed N` | Random seed | - |
| `--parallel N` | Parallel LLM calls | 4 |

## run_experiments.py parameters

| Parameter | Description |
|-----------|-------------|
| `--next [N]` | Run next N pending (default: 1) |
| `--status` | Show status and exit |
| `--phase P2` | Run a specific phase |
| `--experiment EXP-011c` | Run a single experiment |
| `--resume` | Run all pending (batch) |
| `--dry-run` | Show commands without executing |
| `--plan FILE` | Path to experiments.yaml |

## Phases (roadmap, adapts as we go)

| Phase | What we test | Initial experiments |
|-------|-------------|---------------------|
| P1 | Validation of v4.2c (map + clusters) | 3 |
| P2 | Yerkes-Dodson (pressure) | 5 |
| P3 | Architecture (phase transition) | 4 |
| P4 | Emergent parenting | 3 |
| P5 | Model phenotypes | 4 |
| P6 | Long evolution (200t) | 2 |

## What the orchestrator does for each experiment

1. Reads config from experiments.yaml (merge: defaults < phase < experiment)
2. Builds CLI command for arena.py
3. Runs subprocess, timeout 2 hours
4. Analyzes JSONL
5. Saves to `logs/experiments/EXP-XXX/` (config.json, analysis.json, console.txt)
6. Updates experiment_results.json
7. Commits to git
8. On error -- retries 2 times, 30s backoff

## Analysis results

For each experiment computes:
- Shannon entropy (action distribution diversity)
- Social action % (TRADE + COMMUNICATE + REPRODUCE)
- MOVE+GATHER % (survival focus)
- GATHER success rate (v4.2c: do agents understand the map)
- MOVE % (migration to clusters)
- Dual-system distribution (panic/normal/strategic %)
- Parent-child trades
- NAP detection (alliance/pact/peace keywords)
- Population dynamics (start/end/max/min)
- Reproductions count
- Max generation reached

## Troubleshooting

| Problem | Solution |
|---------|----------|
| `pyyaml not found` | `pip3 install pyyaml` |
| Timeout on 200t experiment | Increase timeout in run_experiments.py (7200 -> 14400) |
| Rate limit from API | Decrease `--parallel` (4 -> 2) |
| Cannot find log | Check that arena.py creates a file in logs/ |
| Git commit failed | Check that you're on main, no conflicts |

## Related skills

- `git-workflow` -- commit procedure

Related Skills

claude-code-runner

33
from aAAaqwq/AGI-Super-Team

Execute programming tasks via Claude Code using PTY-based invocation. Handles non-TTY environments, auto-responds to prompts, and manages file synchronization.

wemp-operator

33
from aAAaqwq/AGI-Super-Team

> 微信公众号全功能运营——草稿/发布/评论/用户/素材/群发/统计/菜单/二维码 API 封装

Content & Documentation

zsxq-smart-publish

33
from aAAaqwq/AGI-Super-Team

Publish and manage content on 知识星球 (zsxq.com). Supports talk posts, Q&A, long articles, file sharing, digest/bookmark, homework tasks, and tag management. Use when publishing content to 知识星球, creating/editing posts, uploading files/images/audio, managing digests, batch publishing, or formatting content for 知识星球.

zoom-automation

33
from aAAaqwq/AGI-Super-Team

Automate Zoom meeting creation, management, recordings, webinars, and participant tracking via Rube MCP (Composio). Always search tools first for current schemas.

zoho-crm-automation

33
from aAAaqwq/AGI-Super-Team

Automate Zoho CRM tasks via Rube MCP (Composio): create/update records, search contacts, manage leads, and convert leads. Always search tools first for current schemas.

ziliu-publisher

33
from aAAaqwq/AGI-Super-Team

字流(Ziliu) - AI驱动的多平台内容分发工具。用于一次创作、智能适配排版、一键分发到16+平台(公众号/知乎/小红书/B站/抖音/微博/X等)。当用户需要多平台发布、内容排版、格式适配时使用。触发词:字流、ziliu、多平台发布、一键分发、内容分发、排版发布。

zhihu-post-skill

33
from aAAaqwq/AGI-Super-Team

> 知乎文章发布——知乎平台内容创作与发布自动化

zendesk-automation

33
from aAAaqwq/AGI-Super-Team

Automate Zendesk tasks via Rube MCP (Composio): tickets, users, organizations, replies. Always search tools first for current schemas.

youtube-knowledge-extractor

33
from aAAaqwq/AGI-Super-Team

This skill performs deep analysis of YouTube videos through **both information channels** Multimodal YouTube video analysis through both audio (transcript) and visual (frame extraction + image analysis) channels. Especially powerful for HowTo videos, tutorials, demos, and explainer videos where what is SHOWN (screenshots, UI demos, diagrams, code, physical actions) is just as important as what is SAID. Use this skill whenever a user wants to analyze, summarize, or create step-by-step guides from YouTube videos, or when they share a YouTube URL and want to understand what happens in the video. Triggers on requests like "Analyze this YouTube video", "Create a step-by-step guide from this video", "What does this video show?", "Summarize this tutorial", or any YouTube URL shared with analysis intent.

youtube-factory

33
from aAAaqwq/AGI-Super-Team

Generate complete YouTube videos from a single prompt - script, voiceover, stock footage, captions, thumbnail. Self-contained, no external modules. 100% free tools.

youtube-automation

33
from aAAaqwq/AGI-Super-Team

Automate YouTube tasks via Rube MCP (Composio): upload videos, manage playlists, search content, get analytics, and handle comments. Always search tools first for current schemas.

xlsx

33
from aAAaqwq/AGI-Super-Team

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas