testing-strategy

Comprehensive guide for implementing AIDB tests following E2E-first philosophy, DebugInterface abstraction, and MCP response health standards

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

testing-strategy is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Comprehensive guide for implementing AIDB tests following E2E-first philosophy, DebugInterface abstraction, and MCP response health standards

Teams using testing-strategy should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/testing-strategy/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/testing-security/testing-strategy/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/testing-strategy/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How testing-strategy Compares

Feature / Agent	testing-strategy	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Comprehensive guide for implementing AIDB tests following E2E-first philosophy, DebugInterface abstraction, and MCP response health standards

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# AIDB Testing Strategy

**Priority:** E2E → Integration → Unit (Highest ROI First)

______________________________________________________________________

## CRITICAL: Test Execution Command

**All tests MUST be run via `./dev-cli test run`:**

```bash
./dev-cli test run -s {suite} [-k 'pattern'] [-l {lang}]
```

- Multiple `-k` and `-l` flags supported
- **NEVER use `--local`** - suites know their natural execution environment; forcing local causes unexpected behavior
- Direct `pytest` invocation is NOT supported

______________________________________________________________________

This skill guides you through creating and modifying tests for the AIDB project. The test infrastructure is complete - your job is to implement tests using proven patterns.

## Related Skills

When implementing tests, you may also need:

- **adapter-development** - Tests validate adapter behavior across languages
- **mcp-tools-development** - MCP tools tested with DebugInterface abstraction
- **dap-protocol-guide** - Tests exercise DAP protocol flows end-to-end
- **ci-cd-workflows** - For testing CI/CD workflows themselves (not application code)

## Core Philosophy

**For complete test architecture**, see test infrastructure in `src/tests/`.

### 1. E2E First, Unit Last

**Why E2E First?**

- Validates real user workflows
- Exercises the full stack (catches integration bugs early)
- Most bugs discovered when testing actual components
- Higher ROI than unit tests initially

**Testing Priority:**

1. **E2E Tests** - Complete workflows, real programs, full integration
1. **Integration Tests** - Component interactions, adapter lifecycle, state management
1. **Unit Tests** - Edge cases, specific logic, error handling

### 2. Framework Tests: Keep It Simple

**Goal:** Verify we can launch and debug programs using various frameworks

**Pattern:**

1. Connect to framework app (Django, Express, Spring Boot, etc.)
1. Quick smoke test: set breakpoint → inspect → step
1. That's it. Don't test framework internals.

**We're testing that AIDB works WITH frameworks, not testing the frameworks themselves.**

### 3. MCP Responses: Structure + Content + Efficiency

**Don't just validate structure** - validate content accuracy and payload efficiency.

**Bad test:**

```python
assert "locals" in response["data"]  # Structure only
```

**Good test:**

```python
# Structure
assert "locals" in response["data"]

# Content accuracy
assert response["data"]["locals"]["x"]["value"] == 10
assert response["data"]["locals"]["x"]["line"] == 5

# Efficiency (no junk)
assert len(response["data"]) <= 3  # No bloated payloads
assert len(response["summary"]) <= 200  # Concise summaries
```

### 4. VS Launch: Critical for Agent Workflows

**Why critical:**

- Primary entry point for agents using framework debugging
- Enables use of existing workspace launch configs
- Complex variable substitution must work (`${workspaceFolder}`, etc.)

**Test thoroughly:**

- Core launch.json parsing (language-independent)
- Per-language config translation (Python/JavaScript/Java)
- Variable substitution and resolution
- Framework-specific launch configs

**Critical Breakpoint Timing:**

Set breakpoints when STARTING sessions (not after) to avoid race conditions with fast-executing programs:

```python
# ✅ CORRECT
await debug_interface.start_session(program=prog, breakpoints=[{"line": 10}])

# ❌ WRONG: Race condition
await debug_interface.start_session(program=prog)
await debug_interface.set_breakpoint(file=prog, line=10)  # May be too late!
```

Exception: Long-running processes (servers) where you attach. Reference: `src/tests/aidb_shared/e2e/test_complex_workflows.py`

## DebugInterface Abstraction

The cornerstone of our test strategy is the **DebugInterface abstraction** - a unified API that works with both MCP tools and the direct API.

**For implementation details**, see:

- [DebugInterface](resources/debug-interface.md) - Skill resource file
- `src/tests/_helpers/debug_interface/` - Debug interface source and docstrings

**Why?** One test validates both entry points.

**Hypothetical Example** (illustrates pattern, not a real test file):

```python
from tests._helpers.parametrization import parametrize_interfaces

class TestBreakpoints(BaseE2ETest):
    @parametrize_interfaces  # Runs twice: MCP and API
    @pytest.mark.asyncio
    async def test_set_breakpoint(self, debug_interface, simple_program):
        """Test works with BOTH MCP and API."""
        await debug_interface.start_session(program=simple_program)

        bp = await debug_interface.set_breakpoint(
            file=simple_program,
            line=5
        )

        self.verify_bp.verify_breakpoint_verified(bp)
        await debug_interface.stop_session()
```

**Key Points:**

- `@parametrize_interfaces` runs test with both MCP and API
- Same test logic validates both entry points
- No duplication, no drift
- See [DebugInterface](resources/debug-interface.md) for details

## The Shared Suite: Testing Debug Fundamentals

The **shared suite** is AIDB's language-agnostic test foundation that validates core debugging capabilities across all supported languages using normalized, programmatically generated test programs.

**Key Innovation:** Semantic markers that map identical logic to language-specific line numbers.

**Location:** `src/tests/aidb_shared/` (integration/ + e2e/)

**What it tests:**

- Debug primitives (breakpoints, stepping, variables)
- Control flow across all 3 languages (Python, JavaScript, Java)
- Zero duplication: One test → 6 execution paths (2 interfaces × 3 languages)

**For complete details**, see [DebugInterface](resources/debug-interface.md).

### When to Use the Shared Suite

**Use shared suite when:**

- Testing core debug operations (breakpoint, step, inspect)
- Validating adapter behavior across languages
- Ensuring language parity (all adapters work identically)

**Use framework tests when:**

- Testing framework-specific debugging (Django ORM, Express middleware)
- Validating launch.json configurations
- Testing real-world application patterns

## Test Organization

```
src/tests/
├── aidb_shared/               # ⭐ Shared suite: language-agnostic debug fundamentals
│   ├── integration/          # Core debug operations (breakpoints, stepping, variables)
│   └── e2e/                  # Complex workflows, parallel sessions
├── aidb/                      # Core API tests - organized by component
│   ├── adapters/             # Adapter-specific tests
│   ├── audit/                # Audit logging tests
│   ├── common/               # Common utilities tests
│   ├── dap/                  # DAP client tests
│   ├── models/               # Model tests
│   ├── resources/            # Resource management tests
│   ├── service/              # Service layer tests
│   └── session/              # Session management tests
├── aidb_mcp/                  # MCP server tests - organized by component
├── frameworks/                # Framework integration tests
│   ├── python/               # Flask, FastAPI, pytest
│   ├── javascript/           # Express, Jest
│   └── java/                 # Spring Boot, JUnit
├── _helpers/                  # Test helpers and utilities
├── _fixtures/                 # Shared fixtures
│   └── unit/                 # ⭐ Unit test infrastructure (see below)
└── _assets/                   # Test programs and data
    ├── framework_apps/       # Framework test applications
    └── test_programs/        # Generated programs for shared suite
```

### Unit Test Infrastructure

The centralized unit test infrastructure at `src/tests/_fixtures/unit/` provides reusable mocks, builders, and fixtures:

```
_fixtures/unit/
├── builders/           # DAPRequestBuilder, DAPResponseBuilder, DAPEventBuilder
├── dap/               # Transport, events, receiver mocks
├── session/           # Registry, lifecycle, state, child_manager mocks
├── adapter/           # Port, process, launch_orchestrator mocks
├── mcp/               # DebugService, MCPSessionContext mocks
├── conftest.py        # Master fixture re-exports
├── context.py         # mock_ctx, null_ctx, tmp_storage
└── assertions.py      # UnitAssertions class
```

**Usage Pattern:**

```python
# In domain conftest.py (e.g., src/tests/aidb/dap/unit/conftest.py)
from tests._fixtures.unit.conftest import *  # noqa: F401, F403
from tests._fixtures.unit.builders import DAPEventBuilder, DAPResponseBuilder

# In test file
def test_something(mock_ctx, mock_transport):
    event = DAPEventBuilder.stopped_event(reason="breakpoint")
    # ...
```

**Key Components:**

- **Builders** - Fluent API for DAP protocol objects (requests, responses, events)
- **mock_ctx** - Standard logging context mock with debug/info/warning/error methods
- **UnitAssertions** - DAP-specific assertion helpers

### Test Execution Modes

Test suites run in different environments based on their requirements:

**Local-Only Suites** (no Docker):

- `cli` - CLI command tests
- `mcp` - MCP server unit/integration tests
- `core` - Core AIDB API tests
- `common` - Common utilities tests
- `logging` - Logging framework tests
- `ci_cd` - CI/CD workflow tests

**Docker Suites** (require containers):

- `shared` - Multi-language shared tests (parallel language containers)
- `frameworks` - Framework integration tests (parallel language containers)
- `launch` - Launch config tests (parallel language containers)

**Why the split?**

- **Local suites** test Python-only logic (handlers, validation, utils)
- **Docker suites** test multi-language scenarios
- Multi-language MCP functionality tested in `shared`/`frameworks`/`launch`

**Running tests:**

```bash
./dev-cli test run -s mcp      # Local execution
./dev-cli test run -s shared   # Docker execution
```

## Code Reuse: Don't Reinvent

**Always use existing infrastructure:**

- **Test Base Classes** - `BaseE2ETest`, `BaseIntegrationTest`, `FrameworkDebugTestBase`
- **Parametrization Decorators** - `@parametrize_interfaces`, `@parametrize_languages`
- **Helper Assertions** - `self.verify_bp`, `self.verify_exec`, `MCPAssertions`
- **Constants** - `StopReason`, `TestTimeouts`, `MCPTool`

**For complete details**, see [E2E Patterns](resources/e2e-patterns.md).

## Working Examples

**Study these real tests before writing new ones:**

### Framework Tests (E2E)

- **Python:** `test_flask_debugging.py`, `test_fastapi_debugging.py`, `test_pytest_debugging.py`
- **JavaScript:** `test_express_debugging.py`, `test_jest_debugging.py`
- **Java:** `test_springboot_debugging.py`, `test_junit_debugging.py`

### Core API Tests

- **Launch Variable Resolution:** `test_launch_variable_resolution.py`
- **Session Target Handling:** `test_session_target_handling.py`

**For complete file paths and patterns**, see [E2E Patterns](resources/e2e-patterns.md).

## Common Patterns

**For hypothetical examples illustrating common patterns**, see [E2E Patterns](resources/e2e-patterns.md).

**Key patterns covered:**

1. Basic E2E Test
1. Breakpoint Test with Markers
1. Dual-Launch Equivalence Test
1. MCP Response Validation

## When Creating New Tests

### Step 1: Choose Test Type

- **E2E?** Full workflow, real program, complete integration
- **Integration?** Component interactions, lifecycle management
- **Unit?** Specific function, edge case, error handling

### Step 2: Find Similar Test

Look at [E2E Patterns](resources/e2e-patterns.md):

- Django/Express for framework tests
- Existing tests in the same directory
- Similar test scenarios in other languages

### Step 3: Copy Pattern, Adapt

Don't start from scratch:

1. Copy a working test
1. Adapt to your scenario
1. Use same helpers and assertions
1. Follow same structure

### Step 4: Use Existing Infrastructure

**Don't create:**

- New assertion helpers (use existing)
- New fixtures (check `conftest.py` files first)
- New constants (use `constants.py`)
- New base classes (inherit from existing)

**Do create:**

- Tests using existing patterns
- Scenario-specific test data
- Framework-specific fixtures (if needed)

## Performance Testing

**Current State:** No performance baselines exist yet

**Phase 1:** Establish baselines

- Analyze existing metrics/instrumentation
- Determine "healthy" latencies
- Document target times

**Phase 2:** Regression testing

- Monitor key operations (breakpoint set, variable inspect, step)
- Alert on degradation

**For now:** Focus on functional correctness, not performance.

## Success Criteria

### Test Quality Checklist

- [ ] Test uses `@parametrize_interfaces` for MCP/API coverage
- [ ] Test inherits from appropriate base class
- [ ] Test uses helper assertions, not custom assertions
- [ ] Test validates content accuracy, not just structure
- [ ] MCP tests check efficiency (no bloated payloads)
- [ ] Test has clear docstring explaining what it validates
- [ ] Test follows working examples
- [ ] Test is in correct directory (e2e/integration/unit)

### Framework Test Checklist

- [ ] Inherits from `FrameworkDebugTestBase`
- [ ] Implements `test_launch_via_api()`
- [ ] Implements `test_launch_via_vscode_config()`
- [ ] Implements `test_dual_launch_equivalence()`
- [ ] Sets `framework_name` attribute
- [ ] Uses simple smoke tests (no deep framework testing)

## Investigating Test Failures

**CRITICAL:** When tests fail, check logs BEFORE attempting fixes.

**See:** **[Debugging Failures](resources/debugging-failures.md)** for log locations, investigation workflow, and common patterns.

**For CI test failures**, use the **ci-cd-workflows** skill's troubleshooting guide.

## Resources

| Resource                                              | Content                                              |
| ----------------------------------------------------- | ---------------------------------------------------- |
| [E2E Patterns](resources/e2e-patterns.md)             | Test patterns, markers, code reuse, working examples |
| [Framework Tests](resources/framework-tests.md)       | Dual-launch pattern, Flask/Express examples          |
| [DebugInterface](resources/debug-interface.md)        | Unified API abstraction, shared suite architecture   |
| [Debugging Failures](resources/debugging-failures.md) | Log locations, investigation workflow, common issues |

**Test Infrastructure:** `src/tests/` (see \_fixtures/, \_helpers/ for core components)

## Getting Started

1. **Read CONTEXT.md:** `wip/test-implementation-backlog/CONTEXT.md`
1. **Study working examples:** Flask (`test_flask_debugging.py`) and Express (`test_express_debugging.py`)
1. **Choose a test to implement:** Start with E2E (highest ROI)
1. **Copy a working test:** Don't start from scratch
1. **Adapt to your scenario:** Use same patterns, different data
1. **Validate:** Run test, ensure it passes with both MCP and API

## Questions?

**Internal Documentation**:

- `src/tests/` - Test infrastructure (see \_fixtures/, \_helpers/)
- `docs/developer-guide/overview.md` - System architecture

**Code References**:

- **DAP Protocol:** See `src/aidb/dap/protocol/` (fully typed, types.py + requests.py + responses.py + events.py)
- **Test Infrastructure:** See `src/tests/_helpers/` and `src/tests/_fixtures/`
- **Working Examples:** See Flask/Express framework tests

______________________________________________________________________

**Remember:**

- E2E first, validate content accuracy
- Use shared suite for debug fundamentals, framework tests for integration
- Keep framework tests simple (no framework internals)
- Always use the DebugInterface abstraction (zero duplication)

Related Skills

web-security-testing

from diegosouzapw/awesome-omni-skill

Web application security testing workflow for OWASP Top 10 vulnerabilities including injection, XSS, authentication flaws, and access control issues.

web-app-testing

from diegosouzapw/awesome-omni-skill

Gemini 2.5 Computer Use for browser automation with VISIBLE local browser. Watch Gemini AI control your browser in real-time. Perfect for web app testing, automation demos, and debugging.

wallaby-testing

from diegosouzapw/awesome-omni-skill

Check test status and debug failing tests using Wallaby.js real-time test results. Use after making code changes to verify tests pass, when checking if tests are failing, debugging test errors, analyzing assertions, inspecting runtime values, checking coverage, updating snapshots, or when user mentions Wallaby, tests, coverage, or test status.

unit-testing-test-generate

from diegosouzapw/awesome-omni-skill

Generate comprehensive, maintainable unit tests across languages with strong coverage and edge case focus.

treido-testing

from diegosouzapw/awesome-omni-skill

Testing specialist for Treido (Playwright + Next.js). Use for writing/debugging E2E tests, deflaking, selectors, auth state, parallel execution, and CI stability.

testing-workflow

from diegosouzapw/awesome-omni-skill

Meta-skill that orchestrates comprehensive testing across a project by coordinating testing-patterns, e2e-testing, and testing agents. Use when setting up testing for a new project, improving coverage for an existing project, establishing a testing strategy, or verifying quality before a release.

testing-strategies

from diegosouzapw/awesome-omni-skill

Testing strategies, patterns, and best practices for production code

testing-services

from diegosouzapw/awesome-omni-skill

Writes unit tests for Python service classes using Arrange-Act-Assert pattern with proper mocking at boundaries. Tests behavior, not implementation. Mocks external systems only (API calls, file I/O, databases). Use when writing tests for services or fixing test coverage.

testing-quality

from diegosouzapw/awesome-omni-skill

Plans and executes comprehensive testing strategy across frontend, backend, and AI tiers. Activates when writing tests, testing features, setting up test infrastructure, checking coverage, running E2E tests, or performance testing. Does not handle writing production code (backend-developer or frontend-developer), vulnerability/security review (security), or infrastructure deployment (devops).

testing-patterns

from diegosouzapw/awesome-omni-skill

Testing patterns using bun:test with in-memory SQLite. Use when writing unit tests, integration tests, or router tests.

testing-obsessive

from diegosouzapw/awesome-omni-skill

This skill should be used when the user mentions "write tests", "test coverage", "testing strategy", "unit tests", "integration tests", "e2e tests", "vitest", "jest", discusses testing approaches, asks about test patterns, or works on test files. Addresses testing fundamentals with emphasis on Vitest and Svelte component testing using pragmatic, risk-based approaches.

testing

from diegosouzapw/awesome-omni-skill

Comprehensive testing specialization covering test strategy, automation, TDD methodology, test writing, and web app testing. Use when setting up test infrastructure, writing tests, implementing TDD workflows, analyzing coverage, integrating tests into CI/CD, or testing web applications with Playwright. Framework-agnostic approach with framework-specific guidance via reference files.