e2e-tests-studio

REQUIRED when modifying any file in packages/playground-ui or packages/playground. Triggers on: React component creation/modification/refactoring, UI changes, new playground features, bug fixes affecting studio UI. Generates Playwright E2E tests that validate PRODUCT BEHAVIOR, not just UI states.

22,487 stars

bymastra-ai

View on GitHub Installation ↓

Best use case

e2e-tests-studio is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using e2e-tests-studio should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/e2e-tests-studio/SKILL.md --create-dirs "https://raw.githubusercontent.com/mastra-ai/mastra/main/.claude/skills/e2e-tests-studio/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/e2e-tests-studio/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How e2e-tests-studio Compares

Feature / Agent	e2e-tests-studio	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

SKILL.md Source

# E2E Behavior Validation for Frontend Modifications

## Core Principle: Test Product Behavior, Not UI States

**CRITICAL**: Tests must verify that product features WORK correctly, not just that UI elements render.

### What NOT to test (UI States):

- ❌ "Dropdown opens when clicked"
- ❌ "Modal appears after button click"
- ❌ "Loading spinner shows during request"
- ❌ "Form fields are visible"
- ❌ "Sidebar collapses"

### What TO test (Product Behavior):

- ✅ "Selecting an LLM provider configures the agent to use that provider"
- ✅ "Creating a new agent persists it and shows in the agents list"
- ✅ "Running a tool with parameters returns the expected output"
- ✅ "Chat messages stream correctly and maintain conversation context"
- ✅ "Workflow execution triggers tools in the correct order"

## Prerequisites

Requires Playwright MCP server. If the `browser_navigate` tool is unavailable, instruct the user to add it:

```sh
claude mcp add playwright -- npx @playwright/mcp@latest
```

## Step 1: Understand the Feature Intent

Before writing ANY test, answer these questions:

1. **What user problem does this feature solve?**
2. **What is the expected outcome when the feature works correctly?**
3. **What data flows through the system?** (user input → API → state → UI)
4. **What should persist after page reload?**
5. **What downstream effects should this action have?**

Document these answers as comments in your test file.

## Step 2: Build and Start

```sh
pnpm build:cli
cd packages/playground/e2e/kitchen-sink && pnpm dev
```

Verify server at http://localhost:4111

## Step 3: Map Feature to Behavior Tests

### Feature-to-Test Mapping Guide

| Feature Category           | What to Test                                      | Example Assertion                                            |
| -------------------------- | ------------------------------------------------- | ------------------------------------------------------------ |
| **Agent Configuration**    | Config changes affect agent behavior              | Send message → verify response uses selected model           |
| **LLM Provider Selection** | Selected provider is used in requests             | Intercept API call → verify provider in request payload      |
| **Tool Execution**         | Tool runs with correct params & returns result    | Execute tool → verify output matches expected transformation |
| **Workflow Execution**     | Steps execute in order, data flows between steps  | Run workflow → verify each step's output feeds next step     |
| **Chat/Streaming**         | Messages persist, context maintained across turns | Multi-turn conversation → verify context awareness           |
| **MCP Server Tools**       | Server tools are callable and return data         | Call MCP tool → verify response structure and content        |
| **Memory/Persistence**     | Data survives page reload                         | Create item → reload → verify item exists                    |
| **Error Handling**         | Errors surface correctly to user                  | Trigger error condition → verify error message + recovery    |

## Step 4: Write Behavior-Focused Tests

### Test Structure Template

```ts
import { test, expect, Page } from '@playwright/test';
import { resetStorage } from '../__utils__/reset-storage';
import { selectFixture } from '../__utils__/select-fixture';
import { nanoid } from 'nanoid';

/**
 * FEATURE: [Name of feature]
 * USER STORY: As a user, I want to [action] so that [outcome]
 * BEHAVIOR UNDER TEST: [Specific behavior being validated]
 */

test.describe('[Feature Name] - Behavior Tests', () => {
  let page: Page;

  test.beforeEach(async ({ browser }) => {
    const context = await browser.newContext();
    page = await context.newPage();
  });

  test.afterEach(async () => {
    await resetStorage(page);
  });

  test('should [verb describing behavior] when [trigger condition]', async () => {
    // ARRANGE: Set up preconditions
    // - Navigate to the feature
    // - Configure any required state
    // ACT: Perform the user action that triggers the behavior
    // ASSERT: Verify the OUTCOME, not the UI state
    // - Check data persistence
    // - Verify downstream effects
    // - Confirm API calls made correctly
  });
});
```

### Behavior Test Patterns

#### Pattern 1: Configuration Affects Behavior

```ts
test('selecting LLM provider should use that provider for agent responses', async () => {
  // ARRANGE
  await page.goto('/agents/my-agent/chat');

  // Intercept API to verify provider
  let capturedProvider: string | null = null;
  await page.route('**/api/chat', route => {
    const body = JSON.parse(route.request().postData() || '{}');
    capturedProvider = body.provider;
    route.continue();
  });

  // ACT: Select a different provider
  await page.getByTestId('provider-selector').click();
  await page.getByRole('option', { name: 'OpenAI' }).click();

  // Send a message to trigger the agent
  await page.getByTestId('chat-input').fill('Hello');
  await page.getByTestId('send-button').click();

  // ASSERT: Verify the selected provider was used
  await expect.poll(() => capturedProvider).toBe('openai');
});
```

#### Pattern 2: Data Persistence

```ts
test('created agent should persist after page reload', async () => {
  // ARRANGE
  await page.goto('/agents');
  const agentName = `Test Agent ${nanoid()}`;

  // ACT: Create new agent
  await page.getByTestId('create-agent-button').click();
  await page.getByTestId('agent-name-input').fill(agentName);
  await page.getByTestId('save-agent-button').click();

  // Wait for creation to complete
  await expect(page.getByText(agentName)).toBeVisible();

  // ASSERT: Verify persistence
  await page.reload();
  await expect(page.getByText(agentName)).toBeVisible({ timeout: 10000 });
});
```

#### Pattern 3: Tool Execution Produces Correct Output

```ts
test('weather tool should return formatted weather data', async () => {
  // ARRANGE
  await selectFixture(page, 'weather-success');
  await page.goto('/tools/weather-tool');

  // ACT: Execute tool with parameters
  await page.getByTestId('param-city').fill('San Francisco');
  await page.getByTestId('execute-tool-button').click();

  // ASSERT: Verify OUTPUT content, not just that output appears
  const output = page.getByTestId('tool-output');
  await expect(output).toContainText('temperature');
  await expect(output).toContainText('San Francisco');

  // Verify structured data if applicable
  const outputText = await output.textContent();
  const outputData = JSON.parse(outputText || '{}');
  expect(outputData).toHaveProperty('temperature');
  expect(outputData).toHaveProperty('conditions');
});
```

#### Pattern 4: Workflow Step Chaining

```ts
test('workflow should pass data between steps correctly', async () => {
  // ARRANGE
  await selectFixture(page, 'workflow-multi-step');
  const sessionId = nanoid();
  await page.goto(`/workflows/data-pipeline?session=${sessionId}`);

  // ACT: Trigger workflow execution
  await page.getByTestId('workflow-input').fill('test input data');
  await page.getByTestId('run-workflow-button').click();

  // ASSERT: Verify each step received correct input from previous step
  // Wait for completion
  await expect(page.getByTestId('workflow-status')).toHaveText('completed', { timeout: 30000 });

  // Check step outputs show data transformation chain
  const step1Output = await page.getByTestId('step-1-output').textContent();
  const step2Output = await page.getByTestId('step-2-output').textContent();

  // Verify step 2 received step 1's output as input
  expect(step2Output).toContain(step1Output);
});
```

#### Pattern 5: Streaming Chat with Context

```ts
test('chat should maintain conversation context across messages', async () => {
  // ARRANGE
  await selectFixture(page, 'contextual-chat');
  const chatId = nanoid();
  await page.goto(`/agents/assistant/chat/${chatId}`);

  // ACT: Multi-turn conversation
  await page.getByTestId('chat-input').fill('My name is Alice');
  await page.getByTestId('send-button').click();
  await expect(page.getByTestId('assistant-message').last()).toBeVisible({ timeout: 20000 });

  await page.getByTestId('chat-input').fill('What is my name?');
  await page.getByTestId('send-button').click();

  // ASSERT: Verify context was maintained
  const response = page.getByTestId('assistant-message').last();
  await expect(response).toContainText('Alice', { timeout: 20000 });
});
```

#### Pattern 6: Error Recovery

```ts
test('should show actionable error and allow retry when API fails', async () => {
  // ARRANGE: Set up failure fixture
  await selectFixture(page, 'api-failure');
  await page.goto('/tools/flaky-tool');

  // ACT: Trigger the error
  await page.getByTestId('execute-tool-button').click();

  // ASSERT: Error is shown with recovery option
  await expect(page.getByTestId('error-message')).toContainText('failed');
  await expect(page.getByTestId('retry-button')).toBeVisible();

  // Switch to success fixture and retry
  await selectFixture(page, 'api-success');
  await page.getByTestId('retry-button').click();

  // Verify recovery worked
  await expect(page.getByTestId('tool-output')).toBeVisible({ timeout: 10000 });
  await expect(page.getByTestId('error-message')).not.toBeVisible();
});
```

## Step 5: Update Existing Tests

When a test file already exists:

1. **Read the existing tests** to understand current coverage
2. **Identify if tests are UI-focused or behavior-focused**
3. **Refactor UI-focused tests** to verify behavior instead:

### Refactoring Example

**BEFORE (UI-focused):**

```ts
test('dropdown opens when clicked', async () => {
  await page.getByTestId('model-dropdown').click();
  await expect(page.getByRole('listbox')).toBeVisible();
});
```

**AFTER (Behavior-focused):**

```ts
test('selecting model from dropdown updates agent configuration', async () => {
  // Open dropdown and select model
  await page.getByTestId('model-dropdown').click();
  await page.getByRole('option', { name: 'GPT-4' }).click();

  // Verify the selection persists and affects behavior
  await page.reload();
  await expect(page.getByTestId('model-dropdown')).toHaveText('GPT-4');

  // Optionally: verify the model is used in actual requests
  // (via request interception or checking response metadata)
});
```

## Step 6: Kitchen-Sink Fixtures for Behavior Testing

Fixtures should represent **realistic scenarios**, not just mock data:

### Fixture Naming Convention

```
<feature>-<scenario>.fixture.ts

Examples:
- agent-with-tools.fixture.ts
- chat-multi-turn-context.fixture.ts
- workflow-parallel-execution.fixture.ts
- tool-validation-error.fixture.ts
- mcp-server-timeout.fixture.ts
```

### Fixture Content Requirements

Each fixture must define:

1. **Scenario description** (what behavior it enables testing)
2. **Expected outcomes** (what assertions should pass)
3. **Edge cases covered** (error states, empty states, etc.)

```ts
// fixtures/agent-provider-switch.fixture.ts
export const agentProviderSwitch = {
  name: 'agent-provider-switch',
  description: 'Tests that switching LLM providers changes agent behavior',

  // Mock responses for different providers
  responses: {
    openai: { content: 'Response from OpenAI', model: 'gpt-4' },
    anthropic: { content: 'Response from Anthropic', model: 'claude-3' },
  },

  expectedBehavior: {
    // When provider is switched, subsequent messages use new provider
    providerSwitchAffectsNextMessage: true,
    // Provider selection persists across page reload
    providerPersistsOnReload: true,
  },
};
```

## Step 7: Run and Validate

```sh
cd packages/playground && pnpm test:e2e
```

### Test Quality Checklist

Before considering tests complete, verify:

- [ ] Each test has a clear user story comment
- [ ] Tests verify OUTCOMES, not intermediate UI states
- [ ] Tests would FAIL if the feature broke (not just if UI changed)
- [ ] Persistence is verified via `page.reload()` where applicable
- [ ] Error scenarios are covered
- [ ] Tests use appropriate timeouts for async operations
- [ ] Fixtures represent realistic usage scenarios

## Quick Reference

| Step      | Command/Action                                        |
| --------- | ----------------------------------------------------- |
| Build     | `pnpm build:cli`                                      |
| Start     | `cd packages/playground/e2e/kitchen-sink && pnpm dev` |
| App URL   | http://localhost:4111                                 |
| Routes    | `@packages/playground/src/App.tsx`                    |
| Run tests | `cd packages/playground && pnpm test:e2e`             |
| Test dir  | `packages/playground/e2e/tests/`                      |
| Fixtures  | `packages/playground/e2e/kitchen-sink/fixtures/`      |

## Anti-Patterns to Avoid

| ❌ Don't                           | ✅ Do Instead                                                |
| ---------------------------------- | ------------------------------------------------------------ |
| Test that modal opens              | Test that modal action completes and persists                |
| Test that button is clickable      | Test that clicking button produces expected result           |
| Test loading spinner appears       | Test that loaded data is correct                             |
| Test form validation message shows | Test that invalid form cannot submit AND valid form succeeds |
| Test dropdown has options          | Test that selecting option changes system behavior           |
| Test sidebar navigation works      | Test that navigated page has correct data/functionality      |
| Assert element is visible          | Assert element contains expected data/state                  |

Related Skills

security-review

22487

from mastra-ai/mastra

Security-focused code review checklist for identifying vulnerabilities

performance-review

22487

from mastra-ai/mastra

Performance-focused code review for identifying bottlenecks and optimization opportunities

code-standards

22487

from mastra-ai/mastra

Code quality standards and style guide for reviewing pull requests

customer-support

22487

from mastra-ai/mastra

Guidelines for handling customer support interactions. Use when responding to user inquiries, troubleshooting issues, or escalating problems.

code-review

22487

from mastra-ai/mastra

Provides structured code review guidelines for TypeScript projects. Use when reviewing pull requests, analyzing code quality, or suggesting improvements.

api-design

22487

from mastra-ai/mastra

Guidelines for designing RESTful APIs and TypeScript interfaces. Use when designing new endpoints, reviewing API contracts, or structuring data models.

brand-guidelines

22487

from mastra-ai/mastra

Applies Mastra's brand colors, typography, and writing style to documentation, code examples, or artifacts. Use when brand colors, style guidelines, visual formatting, or company design standards apply.

tailwind-best-practices

22487

from mastra-ai/mastra

Tailwind CSS styling guidelines for Mastra Playground UI. This skill should be used when writing, reviewing, or refactoring styling code in packages/playground-ui and packages/playground to ensure design system consistency. Triggers on tasks involving Tailwind classes, component styling, or design tokens.

smoke-test

22487

from mastra-ai/mastra

Create a Mastra project using create-mastra and smoke test the studio in Chrome

react-best-practices

22487

from mastra-ai/mastra

React performance optimization guidelines from Mastra Engineering. This skill should be used when writing, reviewing, or refactoring React code to ensure optimal performance patterns. Triggers on tasks involving React components, data fetching, bundle optimization, or performance improvements.

ralph-plan

22487

from mastra-ai/mastra

Interactive planning assistant that helps create focused, well-structured ralph-loop commands through collaborative conversation

mastra-docs

22487

from mastra-ai/mastra

Documentation guidelines for Mastra. This skill should be used when writing or editing documentation for Mastra. Triggers on tasks involving documentation creation or updates.