qe-testability-scoring

AI-powered testability assessment using 10 principles of intrinsic testability with Playwright and optional Vibium integration. Evaluates web applications against Observability, Controllability, Algorithmic Simplicity, Transparency, Stability, Explainability, Unbugginess, Smallness, Decomposability, and Similarity. Use when assessing software testability, evaluating test readiness, identifying testability improvements, or generating testability reports.

Best use case

qe-testability-scoring is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

AI-powered testability assessment using 10 principles of intrinsic testability with Playwright and optional Vibium integration. Evaluates web applications against Observability, Controllability, Algorithmic Simplicity, Transparency, Stability, Explainability, Unbugginess, Smallness, Decomposability, and Similarity. Use when assessing software testability, evaluating test readiness, identifying testability improvements, or generating testability reports.

Teams using qe-testability-scoring should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/qe-testability-scoring/SKILL.md --create-dirs "https://raw.githubusercontent.com/proffesor-for-testing/agentic-qe/main/.kiro/skills/qe-testability-scoring/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/qe-testability-scoring/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How qe-testability-scoring Compares

Feature / Agentqe-testability-scoringStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

AI-powered testability assessment using 10 principles of intrinsic testability with Playwright and optional Vibium integration. Evaluates web applications against Observability, Controllability, Algorithmic Simplicity, Transparency, Stability, Explainability, Unbugginess, Smallness, Decomposability, and Similarity. Use when assessing software testability, evaluating test readiness, identifying testability improvements, or generating testability reports.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Testability Scoring

<default_to_action>
When assessing testability:
1. RUN assessment against target URL
2. ANALYZE all 10 principles automatically
3. GENERATE HTML report with radar chart
4. PRIORITIZE improvements by impact/effort
5. INTEGRATE with QX Partner for holistic view

**Quick Assessment:**
```bash
# Run assessment on any URL
TEST_URL='https://example.com/' npx playwright test tests/testability-scoring/testability-scoring.spec.js --project=chromium --workers=1

# Or use shell script wrapper
.claude/skills/testability-scoring/scripts/run-assessment.sh https://example.com/
```

**The 10 Principles at a Glance:**
| Principle | Weight | Key Question |
|-----------|--------|--------------|
| **Observability** | 15% | Can we see what's happening? |
| **Controllability** | 15% | Can we control the application? |
| **Algorithmic Simplicity** | 10% | Are behaviors predictable? |
| **Algorithmic Transparency** | 10% | Can we understand what it does? |
| **Algorithmic Stability** | 10% | Does behavior remain consistent? |
| **Explainability** | 10% | Is the interface understandable? |
| **Unbugginess** | 10% | How error-free is it? |
| **Smallness** | 10% | Are components appropriately sized? |
| **Decomposability** | 5% | Can we test parts in isolation? |
| **Similarity** | 5% | Is the tech stack familiar? |

**Grade Scale:**
- **A (90-100)**: Excellent testability
- **B (80-89)**: Good testability
- **C (70-79)**: Adequate testability
- **D (60-69)**: Below average
- **F (0-59)**: Poor testability
</default_to_action>

## Quick Reference Card

### Running Assessments

| Method | Command | When to Use |
|--------|---------|-------------|
| Shell Script | `./scripts/run-assessment.sh URL` | One-time assessment |
| ENV Override | `TEST_URL='URL' npx playwright test...` | CI/CD integration |
| Config File | Update `tests/testability-scoring/config.js` | Repeated runs |

### Principle Details

#### High Weight (15% each)
| Principle | Measures | Indicators |
|-----------|----------|------------|
| **Observability** | State visibility, logging, monitoring | Console output, network tracking, error visibility |
| **Controllability** | Input control, state manipulation | API access, test data injection, determinism |

#### Medium Weight (10% each)
| Principle | Measures | Indicators |
|-----------|----------|------------|
| **Simplicity** | Predictable behavior | Clear I/O relationships, low complexity |
| **Transparency** | Understanding what system does | Visible processes, readable code |
| **Stability** | Consistent behavior | Change resilience, maintainability |
| **Explainability** | Interface understanding | Good docs, semantic structure, help text |
| **Unbugginess** | Error-free operation | Console errors, warnings, runtime issues |
| **Smallness** | Component size | Element count, script bloat, page complexity |

#### Low Weight (5% each)
| Principle | Measures | Indicators |
|-----------|----------|------------|
| **Decomposability** | Isolation testing | Component separation, modular design |
| **Similarity** | Technology familiarity | Standard frameworks, known patterns |

---

## Assessment Workflow

```
1. Navigate to URL → 2. Collect Metrics → 3. Score Principles
                                              ↓
4. Generate JSON ← 5. Calculate Grades ← 6. Apply Weights
         ↓
7. Generate HTML Report with Radar Chart
         ↓
8. Open in Browser (auto-opens)
```

### Output Files

```
tests/reports/
├── testability-results-<timestamp>.json  # Raw data
├── testability-report-<timestamp>.html   # Visual report
└── latest.json                           # Symlink
```

---

## Integration Examples

### CI/CD Integration
```yaml
# GitHub Actions
- name: Testability Assessment
  run: |
    timeout 180 .claude/skills/testability-scoring/scripts/run-assessment.sh ${{ env.APP_URL }}

- name: Upload Reports
  uses: actions/upload-artifact@v3
  with:
    name: testability-reports
    path: tests/reports/testability-*.html
```

### QX Partner Integration
```typescript
// Combine testability with QX analysis
const qxAnalysis = await Task("QX Analysis", {
  target: 'https://example.com',
  integrateTestability: true
}, "qx-partner");

// Returns combined insights:
// - QX Score: 78/100
// - Testability Integration: Observability 72/100
// - Combined Insight: Low observability may mask UX issues
```

### Programmatic Usage
```typescript
import { runTestabilityAssessment } from './testability';

const results = await runTestabilityAssessment('https://example.com');
console.log(`Overall: ${results.overallScore}/100 (${results.grade})`);
console.log('Recommendations:', results.recommendations);
```

---

## Agent Integration

```typescript
// Run testability assessment
const assessment = await Task("Testability Assessment", {
  url: 'https://example.com',
  generateReport: true,
  openBrowser: true
}, "qe-quality-analyzer");

// Use with QX Partner for holistic analysis
const qxReport = await Task("Full QX Analysis", {
  target: 'https://example.com',
  integrateTestability: true,
  detectOracleProblems: true
}, "qx-partner");
```

---

## Vibium Integration (Optional)

### Overview
Vibium browser automation can be used alongside Playwright for enhanced testability assessment. While **Playwright remains the primary engine**, Vibium offers complementary capabilities for certain metrics.

**Installation:**
```bash
claude mcp add vibium -- npx -y vibium
```

### Vibium-Enhanced Metrics

| Principle | Vibium Enhancement | Benefit |
|-----------|-------------------|---------|
| **Observability** | Auto-wait duration tracking | Measures DOM stability (30s timeout, 100ms polling) |
| **Controllability** | Element interaction success rate | Validates automation readiness via MCP |
| **Stability** | Screenshot consistency | Visual regression detection for layout stability |
| **Explainability** | Element attribute extraction | ARIA labels, semantic HTML validation |

### When to Use Vibium

✅ **USE Vibium for:**
- Element stability metrics (auto-wait duration analysis)
- Visual consistency checks (screenshot comparison)
- MCP-native AI agent integration
- Lightweight Docker images (400MB vs 1.2GB)

❌ **USE Playwright for:**
- Console error detection (Vibium V1 lacks console API)
- Network performance metrics (BiDi network APIs coming in V2)
- Comprehensive browser coverage (Firefox, Safari)
- Production-proven stability (Vibium V1 released Dec 2024)

### Hybrid Assessment Example

```typescript
// Testability assessment using both engines
const assessment = {
  // Playwright: Comprehensive metrics
  playwright: await runPlaywrightAssessment(url),

  // Vibium: Stability metrics
  vibium: {
    elementStability: await measureAutoWaitDuration(url),
    visualConsistency: await compareScreenshots(url),
    accessibilityAttributes: await extractARIALabels(url)
  }
};

// Enhanced Observability Score
const observability =
  (assessment.playwright.consoleErrors * 0.6) +
  (assessment.vibium.elementStability * 0.4);
```

### Vibium MCP Tools for Testability

```typescript
// 1. Element Stability Measurement
const browser = await browser_launch();
await browser_navigate({ url });
const startTime = Date.now();
const element = await browser_find({ selector: ".critical-element" });
const autoWaitDuration = Date.now() - startTime;
// Lower duration = better stability

// 2. Visual Consistency Check
const screenshot1 = await browser_screenshot();
await browser_navigate({ url }); // Reload
const screenshot2 = await browser_screenshot();
const visualDiff = compareImages(screenshot1.png, screenshot2.png);
// Lower diff = better stability

// 3. Accessibility Attribute Extraction
const elements = await browser_find({ selector: "button, a, input" });
const ariaLabels = elements.map(el => el.attributes["aria-label"]);
const semanticScore = (ariaLabels.filter(Boolean).length / elements.length) * 100;
```

### Migration Strategy

**Current (V2.2):** Hybrid approach
- Playwright: Primary engine for all 10 principles
- Vibium: Optional enhancement for stability metrics

**Future (V3.0):** When Vibium V2 ships
- Evaluate Vibium as primary engine if:
  - Console/Network APIs available
  - Production stability proven
  - Community adoption increases

## Agent Coordination Hints

### Memory Namespace
```
aqe/testability/
├── assessments/*       - Assessment results by URL
├── historical/*        - Historical scores for trend analysis
├── recommendations/*   - Improvement recommendations
├── integration/*       - QX integration data
└── vibium/*           - Vibium-specific metrics (optional)
```

### Fleet Coordination
```typescript
const testabilityFleet = await FleetManager.coordinate({
  strategy: 'testability-assessment',
  agents: [
    'qe-quality-analyzer',  // Primary assessment
    'qx-partner',           // UX integration
    'qe-visual-tester'      // Visual validation
  ],
  topology: 'sequential'
});
```

---

## Common Issues & Solutions

| Issue | Solution |
|-------|----------|
| Tests timing out | Increase timeout: `timeout 300 ./scripts/run-assessment.sh URL` |
| Partial results | Check console errors, increase network timeout |
| Report not opening | Use `AUTO_OPEN=false`, open manually |
| Config not updating | Use `TEST_URL` env var instead |
| Vibium not available | Install via `claude mcp add vibium -- npx -y vibium` (optional) |
| Hybrid mode errors | Vibium is optional; assessments work without it |

---

## Related Skills
- [accessibility-testing](../accessibility-testing/) - WCAG compliance (overlaps with Explainability)
- [visual-testing-advanced](../visual-testing-advanced/) - UI consistency
- [performance-testing](../performance-testing/) - Load time metrics

---

## Credits & References

### Framework Origin
- **Heuristics for Software Testability** by James Bach and Michael Bolton
- Available at: https://www.satisfice.com/download/heuristics-of-software-testability

### Implementation
- Based on https://github.com/fndlalit/testability-scorer (contributed by [@fndlalit](https://github.com/fndlalit))
- Playwright v1.49.0+ with AI capabilities (primary engine)
- Vibium v1.0+ with MCP integration (optional enhancement)
- Chart.js for radar visualizations

### Vibium Resources
- GitHub: https://github.com/VibiumDev/vibium
- MCP Integration: `claude mcp add vibium -- npx -y vibium`
- Created by Jason Huggins (creator of Selenium/Appium)

---

## Remember

**Testability is an investment, not an afterthought.**

Good testability:
- Reduces debugging time
- Enables faster feedback loops
- Makes defects easier to find
- Supports continuous testing

**Low scores = High risk. Prioritize improvements by weight × impact.**

Related Skills

testability-scoring

298
from proffesor-for-testing/agentic-qe

AI-powered testability assessment using 10 principles of intrinsic testability with Playwright and optional Vibium integration. Evaluates web applications against Observability, Controllability, Algorithmic Simplicity, Transparency, Stability, Explainability, Unbugginess, Smallness, Decomposability, and Similarity. Use when assessing software testability, evaluating test readiness, identifying testability improvements, or generating testability reports.

qe-visual-testing-advanced

298
from proffesor-for-testing/agentic-qe

Advanced visual regression testing with pixel-perfect comparison, AI-powered diff analysis, responsive design validation, and cross-browser visual consistency. Use when detecting UI regressions, validating designs, or ensuring visual consistency.

qe-verification-quality

298
from proffesor-for-testing/agentic-qe

Comprehensive truth scoring, code quality verification, and automatic rollback system with 0.95 accuracy threshold for ensuring high-quality agent outputs and codebase reliability.

qe-test-reporting-analytics

298
from proffesor-for-testing/agentic-qe

Advanced test reporting, quality dashboards, predictive analytics, trend analysis, and executive reporting for QE metrics. Use when communicating quality status, tracking trends, or making data-driven decisions.

qe-test-idea-rewriting

298
from proffesor-for-testing/agentic-qe

Transform passive 'Verify X' test descriptions into active, observable test actions. Use when test ideas lack specificity, use vague language, or fail quality validation. Converts to action-verb format for clearer, more testable descriptions.

qe-test-environment-management

298
from proffesor-for-testing/agentic-qe

Test environment provisioning, infrastructure as code for testing, Docker/Kubernetes for test environments, service virtualization, and cost optimization. Use when managing test infrastructure, ensuring environment parity, or optimizing testing costs.

qe-test-design-techniques

298
from proffesor-for-testing/agentic-qe

Systematic test design with boundary value analysis, equivalence partitioning, decision tables, state transition testing, and combinatorial testing. Use when designing comprehensive test cases, reducing redundant tests, or ensuring systematic coverage.

qe-test-data-management

298
from proffesor-for-testing/agentic-qe

Strategic test data generation, management, and privacy compliance. Use when creating test data, handling PII, ensuring GDPR/CCPA compliance, or scaling data generation for realistic testing scenarios.

qe-test-automation-strategy

298
from proffesor-for-testing/agentic-qe

Design and implement effective test automation with proper pyramid, patterns, and CI/CD integration. Use when building automation frameworks or improving test efficiency.

qe-technical-writing

298
from proffesor-for-testing/agentic-qe

Write clear, engaging technical content from real experience. Use when writing blog posts, documentation, tutorials, or technical articles.

qe-tdd-london-chicago

298
from proffesor-for-testing/agentic-qe

Apply London (mock-based) and Chicago (state-based) TDD schools. Use when practicing test-driven development or choosing testing style for your context.

qe-stream-chain

298
from proffesor-for-testing/agentic-qe

Stream-JSON chaining for multi-agent pipelines, data transformation, and sequential workflows