cli-e2e-testing

CLI E2E testing patterns with BATS - parallelization, state sharing, and timeout management

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

cli-e2e-testing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

CLI E2E testing patterns with BATS - parallelization, state sharing, and timeout management

Teams using cli-e2e-testing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/cli-e2e-testing/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/cli-automation/cli-e2e-testing/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/cli-e2e-testing/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How cli-e2e-testing Compares

Feature / Agent	cli-e2e-testing	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

CLI E2E testing patterns with BATS - parallelization, state sharing, and timeout management

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# CLI E2E Testing Skill

## When to Use This Skill

Use this skill when:
- Writing new CLI E2E tests in `e2e/tests/`
- Reviewing E2E test code
- Debugging slow or timing out E2E tests
- Restructuring tests for better parallelization

---

## Core Principles

### 1. Happy Path Only

E2E tests verify the system works end-to-end. Error cases belong in unit tests.

```bash
# ✅ E2E: Test that the feature works
@test "vm0 run executes agent successfully" {
    run vm0 run "$AGENT" "echo hello"
    assert_success
}

# ❌ Don't test error cases in E2E - use unit tests instead
@test "vm0 run fails with invalid agent" { ... }  # Move to unit test
```

### 2. `vm0 run` is Expensive (~15s)

Each `vm0 run` call takes ~15 seconds due to:
- API call to platform
- E2B sandbox creation
- Volume/artifact mounting
- Mock Claude execution
- Checkpoint creation

**Minimize unnecessary `vm0 run` calls.**

### 3. Parallelization Model

```
Files run in PARALLEL (up to -j 10)
├── file-a.bats ──► case1 → case2 → case3  (SERIAL within file)
├── file-b.bats ──► case1 → case2          (SERIAL within file)
└── file-c.bats ──► case1                  (SERIAL within file)
```

- **Between files**: PARALLEL
- **Within file**: SERIAL
- **`$BATS_FILE_TMPDIR`**: Isolated per file (safe for parallel)

### 4. State Sharing Strategy

| Scenario | Strategy |
|----------|----------|
| Tests share state (session ID, checkpoint ID) | Same file, separate cases |
| Tests are independent | Separate files (parallel) |

### 5. Timeout Management

Each test case has a timeout: **30s for serial**, **60s for parallel/runner tests**.

**Don't stack multiple `vm0 run` in one case - will timeout!**

```bash
# ❌ BAD: 2 vm0 runs = 30s+ (timeout risk)
@test "session test" {
    run vm0 run "$AGENT" ...           # ~15s
    run vm0 run continue "$SESSION_ID" # ~15s
    # Total: ~30s+ in one case
}

# ✅ GOOD: Split into separate cases
@test "step 1: create session" {
    run vm0 run "$AGENT" ...           # ~15s
    echo "$output" | grep -oP 'Session:\s*\K[a-f0-9-]+' > "$BATS_FILE_TMPDIR/session_id"
}

@test "step 2: continue from session" {
    SESSION_ID=$(cat "$BATS_FILE_TMPDIR/session_id")
    run vm0 run continue "$SESSION_ID" # ~15s
}
```

---

## File Organization

### Directory Structure

```
e2e/tests/
├── 01-serial/              # Tests that MUST run serially (scope setup)
├── 02-parallel/            # Tests that CAN run in parallel
│   ├── t03-*.bats          # Independent tests (fast)
│   ├── t06-session.bats    # State-sharing tests (slow, serial within)
│   └── t07-checkpoint.bats # State-sharing tests (slow, serial within)
└── 03-experimental-runner/ # Runner-specific tests
```

### When to Create Separate Files

| Condition | Action |
|-----------|--------|
| Tests share state | Same file |
| Tests are independent | Separate files |
| Test is slow (>15s) but independent | Own file |

---

## State Sharing with `$BATS_FILE_TMPDIR`

`$BATS_FILE_TMPDIR` is a temporary directory:
- **Shared** by all tests within the same file
- **Isolated** between different files (parallel-safe)
- **Automatically cleaned** after file completes

### Pattern: Pass State Between Cases

```bash
setup_file() {
    # One-time setup: compose agent (runs once per file)
    export AGENT_NAME="e2e-session-$(date +%s%3N)"
    vm0 compose "$CONFIG"
}

@test "step 1: create session" {
    run vm0 run "$AGENT_NAME" --artifact-name "$ARTIFACT" "echo test"
    assert_success

    # Save state for next test
    echo "$output" | grep -oP 'Session:\s*\K[a-f0-9-]+' > "$BATS_FILE_TMPDIR/session_id"
}

@test "step 2: continue from session" {
    # Load state from previous test
    SESSION_ID=$(cat "$BATS_FILE_TMPDIR/session_id")

    run vm0 run continue "$SESSION_ID" "echo continue"
    assert_success
}

teardown_file() {
    # One-time cleanup (runs once per file)
}
```

### Pattern: Share Multiple Values

```bash
@test "step 1: create resources" {
    # ... create session and checkpoint

    # Save multiple values
    cat > "$BATS_FILE_TMPDIR/state.env" <<EOF
SESSION_ID=$session_id
CHECKPOINT_ID=$checkpoint_id
ARTIFACT_VERSION=$version
EOF
}

@test "step 2: use resources" {
    # Load all values
    source "$BATS_FILE_TMPDIR/state.env"

    run vm0 run continue "$SESSION_ID" ...
}
```

---

## Test Structure Template

### For State-Sharing Tests (Multiple `vm0 run`)

```bash
#!/usr/bin/env bats

load '../../helpers/setup'

# File-level constants
AGENT_NAME="e2e-feature-$(date +%s%3N)"

setup_file() {
    # Create config and compose agent ONCE
    export TEST_DIR="$(mktemp -d)"
    export TEST_CONFIG="$TEST_DIR/vm0.yaml"

    cat > "$TEST_CONFIG" <<EOF
version: "1.0"
agents:
  ${AGENT_NAME}:
    description: "Test agent"
    framework: claude-code
    image: "vm0/claude-code:dev"
EOF

    vm0 compose "$TEST_CONFIG"
}

setup() {
    # Per-test setup: unique resources
    export ARTIFACT_NAME="art-$(date +%s%3N)-$RANDOM"
}

teardown() {
    # Per-test cleanup (if needed)
}

teardown_file() {
    # File cleanup
    rm -rf "$TEST_DIR"
}

@test "step 1: create session with vm0 run" {
    # Create artifact
    mkdir -p "/tmp/$ARTIFACT_NAME"
    cd "/tmp/$ARTIFACT_NAME"
    vm0 artifact init --name "$ARTIFACT_NAME"
    vm0 artifact push

    # Run agent (~15s)
    run vm0 run "$AGENT_NAME" --artifact-name "$ARTIFACT_NAME" "echo hello"
    assert_success

    # Save session ID for next test
    echo "$output" | grep -oP 'Session:\s*\K[a-f0-9-]+' > "$BATS_FILE_TMPDIR/session_id"
}

@test "step 2: continue from session" {
    SESSION_ID=$(cat "$BATS_FILE_TMPDIR/session_id")

    # Continue session (~15s)
    run vm0 run continue "$SESSION_ID" "echo world"
    assert_success
}
```

### For Independent Tests (Single `vm0 run` or no run)

```bash
#!/usr/bin/env bats

load '../../helpers/setup'

setup() {
    export UNIQUE_ID="$(date +%s%3N)-$RANDOM"
}

@test "vm0 artifact push creates new version" {
    # Independent test - can be in separate file for parallelization
    mkdir -p "/tmp/art-$UNIQUE_ID"
    cd "/tmp/art-$UNIQUE_ID"

    vm0 artifact init --name "test-$UNIQUE_ID"
    echo "content" > file.txt

    run vm0 artifact push
    assert_success
    assert_output --partial "Version:"
}
```

---

## Anti-Patterns

### AP-1: Multiple `vm0 run` in One Case

```bash
# ❌ BAD: Will likely timeout (30s+)
@test "full session workflow" {
    run vm0 run "$AGENT" "create file"     # ~15s
    run vm0 run continue "$SESSION" "read" # ~15s
}

# ✅ GOOD: Split into cases
@test "step 1: create session" { ... }
@test "step 2: continue session" { ... }
```

### AP-2: Independent Tests in Same File

```bash
# ❌ BAD: These run serially but don't need to
# file: t10-mixed.bats
@test "artifact push works" { ... }      # Independent
@test "volume push works" { ... }        # Independent
@test "compose validates config" { ... } # Independent

# ✅ GOOD: Separate files for parallelization
# file: t10a-artifact.bats
@test "artifact push works" { ... }

# file: t10b-volume.bats
@test "volume push works" { ... }
```

### AP-3: Not Using `setup_file()` for Expensive Setup

```bash
# ❌ BAD: Composes agent for EVERY test
setup() {
    vm0 compose "$CONFIG"  # Runs before each test!
}

# ✅ GOOD: Compose once per file
setup_file() {
    vm0 compose "$CONFIG"  # Runs once before all tests
}
```

### AP-4: Testing Error Cases in E2E

```bash
# ❌ BAD: Error cases belong in unit tests
@test "vm0 run fails with missing artifact" {
    run vm0 run "$AGENT" --artifact-name "nonexistent"
    assert_failure
}

# ✅ GOOD: E2E tests happy paths only
@test "vm0 run succeeds with valid artifact" {
    run vm0 run "$AGENT" --artifact-name "$VALID_ARTIFACT"
    assert_success
}
```

### AP-5: Hardcoded Resource Names

```bash
# ❌ BAD: Will conflict in parallel runs
ARTIFACT_NAME="test-artifact"

# ✅ GOOD: Unique names with timestamp + random
ARTIFACT_NAME="test-artifact-$(date +%s%3N)-$RANDOM"
```

---

## Quick Checklist

Before committing E2E tests:

- [ ] Happy path only (error cases → unit tests)
- [ ] Max ONE `vm0 run` per test case (timeout safety)
- [ ] State-sharing tests in same file, independent tests in separate files
- [ ] Use `setup_file()` for expensive one-time setup (compose)
- [ ] Use `$BATS_FILE_TMPDIR` for state between cases
- [ ] Unique resource names (timestamp + random)
- [ ] Cleanup in `teardown()` or `teardown_file()`

---

## Reference

- BATS documentation: https://bats-core.readthedocs.io/en/stable/writing-tests.html
- Test timeout: `BATS_TEST_TIMEOUT=30` (serial) / `BATS_TEST_TIMEOUT=60` (parallel/runner)
- Parallelization: `-j 10 --no-parallelize-within-files`

Related Skills

ai-powered-pentesting

from diegosouzapw/awesome-omni-skill

Guide for AI-powered penetration testing tools, red teaming frameworks, and autonomous security agents.

ab-testing-analyzer

from diegosouzapw/awesome-omni-skill

全面的AB测试分析工具，支持实验设计、统计检验、用户分群分析和可视化报告生成。用于分析产品改版、营销活动、功能优化等AB测试结果，提供统计显著性检验和深度洞察。

bats-testing-patterns

from diegosouzapw/awesome-omni-skill

Comprehensive guide for writing shell script tests using Bats (Bash Automated Testing System). Use when writing or improving tests for Bash/shell scripts, creating test fixtures, mocking commands, or setting up CI/CD for shell script testing. Includes patterns for assertions, setup/teardown, mocking, fixtures, and integration with GitHub Actions.

adb-device-testing

from diegosouzapw/awesome-omni-skill

Use when testing Android apps on ADB-connected devices/emulators - UI automation, screenshots, location spoofing, navigation, app management. Triggers on ADB, emulator, Android testing, location mock, UI test, screenshot walkthrough.

sqlmap-database-pentesting

from diegosouzapw/awesome-omni-skill

This skill should be used when the user asks to "automate SQL injection testing," "enumerate database structure," "extract database credentials using sqlmap," "dump tables and columns...

sql-injection-testing

from diegosouzapw/awesome-omni-skill

This skill should be used when the user asks to "test for SQL injection vulnerabilities", "perform SQLi attacks", "bypass authentication using SQL injection", "extract database inform...

Contract Testing Pact

from diegosouzapw/awesome-omni-skill

Contract testing validates that service consumers and providers agree on request/response expectations. Pact implements consumer-driven contracts (CDC) with shareable pact files and provider verificat

Async Testing Expert

from diegosouzapw/awesome-omni-skill

Comprehensive pytest skill for async Python testing with proper mocking, fixtures, and patterns from production-ready test suites. Use when writing or improving async tests for Python applications, especially FastAPI backends with database interactions.

api-testing

from diegosouzapw/awesome-omni-skill

Test FastAPI endpoints with pytest and generate API documentation. Use when creating new APIs or verifying existing endpoints work correctly.

api-testing-suite

from diegosouzapw/awesome-omni-skill

Validate API responses and schemas.

api-testing-patterns

from diegosouzapw/awesome-omni-skill

Comprehensive API testing patterns including contract testing, REST/GraphQL testing, and integration testing. Use when testing APIs or designing API test strategies.

api-testing-observability-api-mock

from diegosouzapw/awesome-omni-skill

You are an API mocking expert specializing in realistic mock services for development, testing, and demos. Design mocks that simulate real API behavior and enable parallel development.