agentic-quality-engineering

AI agents as force multipliers for quality work. Core skill for all 19 QE agents using PACT principles.

16 stars

Best use case

agentic-quality-engineering is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

AI agents as force multipliers for quality work. Core skill for all 19 QE agents using PACT principles.

Teams using agentic-quality-engineering should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/agentic-quality-engineering/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/ai-agents/agentic-quality-engineering/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/agentic-quality-engineering/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How agentic-quality-engineering Compares

Feature / Agent	agentic-quality-engineering	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

AI agents as force multipliers for quality work. Core skill for all 19 QE agents using PACT principles.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Agentic Quality Engineering

<default_to_action>
When implementing agentic QE or coordinating agents:
1. SPAWN appropriate agent(s) for the task using `Task` tool with agent type
2. CONFIGURE agent coordination (hierarchical/mesh/sequential)
3. EXECUTE with PACT principles: Proactive analysis, Autonomous operation, Collaborative feedback, Targeted risk focus
4. VALIDATE results through quality gates before deployment
5. LEARN from outcomes - store patterns in `aqe/learning/*` namespace

**Quick Agent Selection:**
- Test generation needed → `qe-test-generator`
- Coverage gaps → `qe-coverage-analyzer`
- Quality decision → `qe-quality-gate`
- Security scan → `qe-security-scanner`
- Performance test → `qe-performance-tester`
- Full pipeline → `qe-fleet-commander`

**Critical Success Factors:**
- Agents amplify human expertise, not replace it
- Human-in-the-loop for critical decisions
- Measure: bugs caught, time saved, coverage improved
</default_to_action>

## Quick Reference Card

### When to Use
- Designing autonomous testing systems
- Scaling QE with intelligent agents
- Implementing multi-agent coordination
- Building CI/CD quality pipelines

### PACT Principles
| Principle | Agent Behavior | Human Role |
|-----------|---------------|------------|
| **P**roactive | Analyze pre-merge, predict risk | Set guardrails |
| **A**utonomous | Execute tests, fix flaky tests | Review critical |
| **C**ollaborative | Multi-agent coordination | Provide context |
| **T**argeted | Risk-based prioritization | Define risk areas |

### 19-Agent Fleet
| Category | Agents | Primary Use |
|----------|--------|-------------|
| Core Testing (5) | test-generator, test-executor, coverage-analyzer, quality-gate, quality-analyzer | Daily testing |
| Performance/Security (2) | performance-tester, security-scanner | Non-functional |
| Strategic (3) | requirements-validator, production-intelligence, fleet-commander | Planning |
| Advanced (4) | regression-risk-analyzer, test-data-architect, api-contract-validator, flaky-test-hunter | Specialized |
| Visual/Chaos (2) | visual-tester, chaos-engineer | Edge cases |
| Deployment (1) | deployment-readiness | Release |
| Analysis (1) | code-complexity | Maintainability |

### Coordination Patterns
```
Hierarchical: fleet-commander → [generators] → [executors] → quality-gate
Mesh: test-gen ↔ coverage ↔ quality (peer decisions)
Sequential: risk-analyzer → test-gen → executor → coverage → gate
```

### Success Criteria
✅ 10x deployment frequency with same/better quality
✅ Coverage gaps detected in real-time
✅ Bugs caught pre-production
❌ Agents acting without human oversight on critical decisions
❌ Deploying all 19 agents at once (start with 1-2)

---

## Core Concepts

### QE Evolution
| Stage | Approach | Limitation |
|-------|----------|------------|
| Traditional | Manual everything | Human bottleneck |
| Automation | Scripts + fixed scenarios | Needs orchestration |
| **Agentic** | AI agents + human judgment | Requires trust-building |

**Core Premise:** Agents amplify human expertise for 10x scale.

### Key Capabilities

**1. Intelligent Test Generation**
```typescript
// Agent analyzes code change, generates targeted tests
const tests = await qeTestGenerator.generate(prDiff);
// → Happy path, edge cases, error handling tests
```

**2. Pattern Detection** - Scan logs, find anomalies, correlate errors

**3. Adaptive Strategy** - Adjust test focus based on risk signals

**4. Root Cause Analysis** - Link failures to code changes, suggest fixes

---

## Agent Coordination

### Memory Namespaces
```
aqe/test-plan/*     - Test planning decisions
aqe/coverage/*      - Coverage analysis results
aqe/quality/*       - Quality metrics and gates
aqe/learning/*      - Patterns and Q-values
aqe/coordination/*  - Cross-agent state
```

### Memory Operations (MCP Tools)

**CRITICAL**: Always use `mcp__agentic-qe__memory_store` with `persist: true` for learnings.

**1. Store data to persistent memory:**
```javascript
// Store test plan decisions (persisted to .agentic-qe/memory.db)
mcp__agentic_qe__memory_store({
  key: "aqe/test-plan/pr-123",
  namespace: "aqe/test-plan",
  value: {
    prNumber: 123,
    riskLevel: "medium",
    requiredCoverage: 85,
    testTypes: ["unit", "integration"],
    estimatedTime: 1800
  },
  persist: true,  // ⚠️ REQUIRED for cross-session persistence
  ttl: 604800     // 7 days (0 = permanent)
})
```

**2. Retrieve prior learnings before task:**
```javascript
// Query patterns before starting test generation
const priorData = await mcp__agentic_qe__memory_retrieve({
  key: "aqe/learning/patterns/test-generation/*",
  namespace: "aqe/learning",
  includeMetadata: true
})

// Use patterns to guide current task
if (priorData.success) {
  console.log(`Loaded ${priorData.patterns.length} prior patterns`);
}
```

**3. Store coverage analysis results:**
```javascript
mcp__agentic_qe__memory_store({
  key: "aqe/coverage/auth-module",
  namespace: "aqe/coverage",
  value: {
    moduleId: "auth-module",
    currentCoverage: 78,
    gaps: ["error-handling", "edge-cases"],
    suggestedTests: 12,
    priority: "high"
  },
  persist: true,
  ttl: 1209600  // 14 days
})
```

### Three-Phase Memory Protocol

For coordinated multi-agent tasks, use the STATUS → PROGRESS → COMPLETE pattern:

```javascript
// PHASE 1: STATUS - Task starting
mcp__agentic_qe__memory_store({
  key: "aqe/coordination/task-123/status",
  namespace: "aqe/coordination",
  value: { status: "running", agent: "qe-test-generator", startTime: Date.now() },
  persist: true
})

// PHASE 2: PROGRESS - Intermediate updates
mcp__agentic_qe__memory_store({
  key: "aqe/coordination/task-123/progress",
  namespace: "aqe/coordination",
  value: { progress: 50, action: "generating-unit-tests", testsGenerated: 25 },
  persist: true
})

// PHASE 3: COMPLETE - Task finished
mcp__agentic_qe__memory_store({
  key: "aqe/coordination/task-123/complete",
  namespace: "aqe/coordination",
  value: {
    status: "complete",
    result: "success",
    testsGenerated: 47,
    coverageAchieved: 92.3,
    duration: 15000
  },
  persist: true
})
```

### Blackboard Events
| Event | Trigger | Subscribers |
|-------|---------|-------------|
| `test:generated` | New tests created | executor, coverage |
| `coverage:gap` | Gap detected | test-generator |
| `quality:decision` | Gate evaluated | fleet-commander |
| `security:finding` | Vulnerability found | quality-gate |

### Example: PR Quality Pipeline
```typescript
// 1. Risk analysis
const risks = await Task("Analyze PR", prDiff, "qe-regression-risk-analyzer");

// 2. Generate tests for risks
const tests = await Task("Generate tests", risks, "qe-test-generator");

// 3. Execute + analyze
const results = await Task("Run tests", tests, "qe-test-executor");
const coverage = await Task("Check coverage", results, "qe-coverage-analyzer");

// 4. Quality decision
const decision = await Task("Evaluate", {results, coverage}, "qe-quality-gate");
// → GO/NO-GO with rationale
```

---

## Implementation Phases

| Phase | Duration | Goal | Agent(s) |
|-------|----------|------|----------|
| Experiment | Weeks 1-4 | Validate one use case | 1 agent |
| Integrate | Months 2-3 | CI/CD pipeline | 3-4 agents |
| Scale | Months 4-6 | Multiple use cases | 8+ agents |
| Evolve | Ongoing | Continuous learning | Full fleet |

### Phase 1 Example
```bash
# Week 1: Deploy single agent
aqe agent spawn qe-test-generator

# Weeks 2-3: Generate tests for 10 PRs
# Track: bugs found, test quality, review time

# Week 4: Measure impact
aqe agent metrics qe-test-generator
# → Tests: 150, Bugs: 12, Time saved: 8h
```

---

## Limitations & Strengths

### Agents Excel At
- **Volume**: Scan thousands of logs in seconds
- **Patterns**: Find correlations humans miss
- **Tireless**: 24/7 testing and monitoring
- **Speed**: Instant code change analysis

### Agents Need Humans For
- Business context and priorities
- Ethical judgment and trade-offs
- Creative exploration ("what if" scenarios)
- Domain expertise (healthcare, finance, legal)

---

## Best Practices

| Do | Don't |
|----|-------|
| Start with one agent, one use case | Deploy all 18 at once |
| Build feedback loops early | Deploy and forget |
| Human reviews agent output | Auto-merge without review |
| Measure bugs caught, time saved | Track vanity metrics (test count) |
| Build trust gradually | Give full autonomy immediately |

### Trust Progression
```
Month 1: Agent suggests → Human decides
Month 2: Agent acts → Human reviews after
Month 3: Agent autonomous on low-risk
Month 4: Agent handles critical with oversight
```

---

## Agent Coordination Hints

```yaml
coordination:
  topology: hierarchical
  commander: qe-fleet-commander
  memory_namespace: aqe/coordination
  blackboard_topic: qe-fleet

preload_skills:
  - agentic-quality-engineering  # Always (this skill)
  - risk-based-testing           # For prioritization
  - quality-metrics              # For measurement

agent_assignments:
  qe-test-generator: [api-testing-patterns, tdd-london-chicago]
  qe-coverage-analyzer: [quality-metrics, risk-based-testing]
  qe-security-scanner: [security-testing, risk-based-testing]
  qe-performance-tester: [performance-testing]
```

---

## Related Skills
- `holistic-testing-pact` - PACT principles deep dive
- `risk-based-testing` - Prioritize agent focus
- `quality-metrics` - Measure agent effectiveness
- `api-testing-patterns`, `security-testing`, `performance-testing` - Specialized testing

## Resources
- Agent definitions: `.claude/agents/`
- CLI: `aqe agent --help`
- Fleet status: `aqe fleet status`

---

**Success Metric:** Deploy 10x more frequently with same or better quality through intelligent agent collaboration.

Related Skills

data-quality-frameworks

from diegosouzapw/awesome-omni-skill

Implement data quality validation with Great Expectations, dbt tests, and data contracts. Use when building data quality pipelines, implementing validation rules, or establishing data contracts.

data-engineering-data-pipeline

from diegosouzapw/awesome-omni-skill

You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.

context-engineering

from diegosouzapw/awesome-omni-skill

Use when designing agent system prompts, optimizing RAG retrieval, or when context is too expensive or slow. Reduces tokens while maintaining quality through strategic positioning and attention-aware design.

Build Your Data Engineering Skill

from diegosouzapw/awesome-omni-skill

Create your LLMOps data engineering skill in one prompt, then learn to improve it throughout the chapter

ai-engineering-skill

from diegosouzapw/awesome-omni-skill

Practical guide for building production ML systems based on Chip Huyen's AI Engineering book. Use when users ask about model evaluation, deployment strategies, monitoring, data pipelines, feature engineering, cost optimization, or MLOps. Covers metrics, A/B testing, serving patterns, drift detection, and production best practices.

ai-data-engineering

from diegosouzapw/awesome-omni-skill

Data pipelines, feature stores, and embedding generation for AI/ML systems. Use when building RAG pipelines, ML feature serving, or data transformations. Covers feature stores (Feast, Tecton), embedding pipelines, chunking strategies, orchestration (Dagster, Prefect, Airflow), dbt transformations, data versioning (LakeFS), and experiment tracking (MLflow, W&B).

agenticmail

from diegosouzapw/awesome-omni-skill

🎀 AgenticMail — Full email, SMS, storage & multi-agent coordination for AI agents. 63 tools.

agentic-issue-assistant

from diegosouzapw/awesome-omni-skill

Install common docs/backlog skeleton plus an AGENTS template, and wrap issue/finalization operations for an agentic workflow.

agentic-chat

from diegosouzapw/awesome-omni-skill

AI assistant for creating clear, actionable task descriptions for GitHub Copilot agents

51-execute-quality-150

from diegosouzapw/awesome-omni-skill

[51] EXECUTE. Commitment to maximum quality work with 150% coverage. Use when you need the highest quality output for critical tasks, complex problems, important decisions, or when standard work isn't enough. Triggers on "maximum quality", "150% mode", "full quality", "critical task", or when you explicitly want AI to work at its best.

ai-content-quality-checker

from diegosouzapw/awesome-omni-skill

AI生成コンテンツの総合品質チェックスキル。読みやすさ、正確性、関連性、独自性、SEO、アクセシビリティ、エンゲージメント、文法・スタイルを多角的に評価。

1k-code-quality

from diegosouzapw/awesome-omni-skill

Code quality standards for OneKey. Use when fixing lint warnings, running pre-commit tasks, handling unused variables, writing comments, or ensuring code quality. All comments must be in English. Triggers on lint, linting, eslint, oxlint, tsc, type check, unused variable, comment, documentation, spellcheck, code quality, pre-commit, yarn lint.