live-tests

Writes live integration tests that hit the real Copilot API and record responses as replayable fixtures. Use this skill when adding new agent behaviors, provider integrations, or tool interactions that need real-world API coverage.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

live-tests is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using live-tests should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/live-tests/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/tools/live-tests/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/live-tests/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How live-tests Compares

Feature / Agent	live-tests	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Live Test & Fixture Recording Skill

You write live integration tests that exercise the real GitHub Copilot API and capture responses as JSON fixtures for deterministic replay. This is the project's VCR-like system for ensuring tests stay grounded in real API behavior.

## Architecture overview

```
Live test (mix test --include live --include save_fixtures)
  │
  ├─ RecordingProvider  ← wraps real Copilot provider, captures SSE events
  │     │
  │     └─ persistent_term storage ← events buffered here during stream
  │
  └─ FixtureHelper.save_fixture/2  ← writes events to JSON file
        │
        └─ test/support/fixtures/<name>.json

Fixture-based test (mix test)
  │
  ├─ FixtureProvider  ← reads fixture JSON, replays events via Req.Response.Async
  │     │
  │     └─ FixtureHelper.build_fixture_response/1 ← spawns process to send SSE events
  │
  └─ Assertions on agent behavior, events, tool calls, etc.
```

## When to act

- Adding a new agent behavior that needs a new fixture (e.g., new tool call pattern, multi-turn conversation, error handling).
- Adding or changing provider integration logic.
- When the user asks to "record a fixture", "add a live test", or "capture API responses".
- When an existing fixture is stale and needs re-recording.

## Writing a live test

Live tests go in `test/opal/live_test.exs`. They require `@moduletag :live` (excluded by default) and use the `RecordingProvider` to capture real SSE events.

### Template

```elixir
describe "live API — <scenario description>" do
  @tag :save_fixtures
  @tag timeout: 30_000
  test "records <what this captures>" do
    RecordingProvider.start_recording()

    {:ok, pid} =
      Opal.start_session(%{
        model: {:copilot, "claude-sonnet-4"},
        system_prompt: "<constrained prompt that produces deterministic output>",
        tools: [<tool modules if needed>],
        working_dir: System.tmp_dir!(),
        provider: RecordingProvider
      })

    {:ok, response} = Opal.prompt_sync(pid, "<user message>", 25_000)
    Opal.stop_session(pid)

    events = RecordingProvider.stop_recording()
    assert length(events) > 0

    # Save as a fixture for replay
    path = FixtureHelper.save_fixture("<descriptive_name>.json", events)
    assert File.exists?(path)

    # Assertions on the live response
    assert is_binary(response)
    assert String.contains?(response, "<expected content>")
  end
end
```

### Running

```bash
# Run all live tests (requires valid Copilot auth)
mix test --include live

# Run live tests AND save recorded fixtures to disk
mix test --include live --include save_fixtures

# Run a specific live test
mix test --include live test/opal/live_test.exs:<line_number>
```

## Writing a fixture-based integration test

Once you have a fixture, write a deterministic integration test that replays it. These go in `test/opal/integration_test.exs` or a new file under `test/opal/`.

### Template

```elixir
describe "<feature under test>" do
  test "<what it verifies>" do
    # Configure the FixtureProvider with your fixture
    :persistent_term.put({FixtureProvider, :fixture}, "<your_fixture>.json")

    # For multi-turn tests (tool call → second response):
    # :persistent_term.put({FixtureProvider, :second_fixture}, "<second_turn>.json")

    session_id = "test-#{System.unique_integer([:positive])}"
    {:ok, tool_sup} = Task.Supervisor.start_link()

    agent_opts = [
      session_id: session_id,
      model: Model.new(:copilot, "test-model"),
      system_prompt: "<system prompt>",
      tools: [<tool modules>],
      working_dir: System.tmp_dir!(),
      provider: FixtureProvider,
      tool_supervisor: tool_sup
    ]

    {:ok, pid} = Agent.start_link(agent_opts)
    Events.subscribe(session_id)

    Agent.prompt(pid, "<user message>")

    # Assert on events received
    assert_receive {:event, %{type: :response_start}}, 5_000
    assert_receive {:event, %{type: :text_delta, data: %{delta: delta}}}, 5_000
    assert is_binary(delta)
    assert_receive {:event, %{type: :response_end}}, 5_000
  end
end
```

## Fixture file format

Fixtures live in `test/support/fixtures/` as JSON:

```json
{
  "description": "Recorded live fixture: responses_api_text.json",
  "recorded_at": "2025-02-13T...",
  "events": [
    {"data": "{\"type\":\"response.created\",\"response\":{...}}"},
    {"data": "{\"type\":\"response.output_item.added\",...}"},
    {"data": "{\"type\":\"response.completed\",...}"}
  ]
}
```

Each `data` field contains one SSE event payload as a JSON string.

## Rules

1. **System prompts in live tests must be highly constrained** to produce deterministic output. Use prompts like "Respond with exactly the word 'pong'" rather than open-ended prompts.
2. **Name fixtures descriptively**: `responses_api_tool_call.json`, not `test1.json`.
3. **Never hand-edit fixture JSON.** Always re-record from a live test.
4. **Tag live tests correctly**: `@moduletag :live` on the module, `@tag :save_fixtures` on recording tests, `@tag timeout: 30_000` for API calls.
5. **Clean up after recording tests** if the fixture is only meant for one-time capture — or keep it permanently if it will be used by replay tests.
6. **Fixture-based tests should be fast and async-safe** — they replay from disk, no network needed.
7. When adding a fixture for a new scenario (e.g., error response, high token usage), also add a corresponding integration test that replays it.

## Key files

- `test/opal/live_test.exs` — Live API tests with recording
- `test/opal/integration_test.exs` — Fixture-based integration tests
- `test/support/fixture_helper.ex` — Fixture load/save/replay helpers
- `test/support/fixtures/` — Recorded fixture JSON files

Related Skills

android-additional-tests

from diegosouzapw/awesome-omni-skill

Optional - Add comprehensive tests beyond the basic smoke test

analyzing-backtests

from diegosouzapw/awesome-omni-skill

Analyzes algorithmic trading backtest results from Jupyter notebooks and generates summary reports. Use when the user wants to analyze or summarize backtest notebooks.

add-unit-tests

from diegosouzapw/awesome-omni-skill

Guide for adding unit tests to AReaL. Use when user wants to add tests for new functionality or increase test coverage.

60-validate-tests-150

from diegosouzapw/awesome-omni-skill

[60] VALIDATE. Ensure new (staged and unstaged) changes are covered by tests at >70% and the full test suite is green. Use when asked to validate coverage for recent changes, add tests for modified code, or verify nothing else broke.

harness-delivery-iac

from diegosouzapw/awesome-omni-skill

Use this skill for Harness CI/CD, IaCM, templates, connectors, integrations, pipelines, pull requests, APIs, feature flags, internal developer portal, and governance workflows.

android-ci-tests

from diegosouzapw/awesome-omni-skill

Setup GitHub Actions workflow for running Android tests in CI

using-live-documentation

from diegosouzapw/awesome-omni-skill

Use BEFORE implementing, writing, configuring, or setting up ANY feature involving libraries, frameworks, or complex APIs - even before reading existing code. Fetches current documentation to ensure correct usage. Triggers on third-party libraries (such as react-query, FastAPI, Django, pytest), complex standard library modules (such as subprocess, streams, pathlib, logging), and "how to" questions about library usage. Do NOT use for trivial built-ins (such as dict.get, Array.map) or pure algorithms. Load this skill first to receive guidance on finding current documentation when implementing features, exploring code, or answering library-related questions.

testcontainers-integration-tests

from diegosouzapw/awesome-omni-skill

Use when integration tests require real infrastructure (database, message queue, cache) or when mocking infrastructure is insufficient. Defines container lifecycle, test isolation, and performance optimization for Testcontainers-based testing.

pubnub-live-betting-casino

from diegosouzapw/awesome-omni-skill

Build real-time betting and casino game platforms with PubNub

livekit-nextjs-frontend

from diegosouzapw/awesome-omni-skill

Build and review production-grade web and mobile frontends using LiveKit with Next.js. Covers real-time video/audio/data communication, WebRTC connections, track management, and best practices for LiveKit React components.

blog-smoke-tests

from diegosouzapw/awesome-omni-skill

Run Playwright smoke tests for Denser blog application. Executes 15 tests (SMOKE-01 to SMOKE-15) against configurable environment (production, dev, or localhost) with retry support (max 3 attempts per failing test). Supports headed (visible browser) and headless modes. Collects artifacts (screenshots, trace.zip) on failures and generates HTML report. Use when testing blog functionality, verifying deployments, checking UI/API consistency, or when user requests smoke tests, playwright tests, or blog testing.

inference-smoke-tests

from diegosouzapw/awesome-omni-skill

Run repeatable inference smoke tests using geppetto/pinocchio example binaries (single-pass, streaming, tool-loop, OpenAI Responses thinking) including tmux-driven TUI tests. Use when refactors touch InferenceState/Session/EngineBuilder, tool calling loop, event sinks, provider request formatting, or when you need a quick 'does inference still work?' checklist.