multi-model-reviewer

協調多個 AI 模型（ChatGPT、Gemini、Codex、QWEN、Claude）進行三角驗證，確保「Specification == Program == Test」一致性。過濾假警報後輸出報告，大幅減少人工介入時間。

16 stars

Best use case

multi-model-reviewer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using multi-model-reviewer should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/multi-model-reviewer/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/ai-agents/multi-model-reviewer/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/multi-model-reviewer/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How multi-model-reviewer Compares

Feature / Agent	multi-model-reviewer	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

Cursor vs Codex for AI Workflows

Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.

SKILL.md Source

# Multi-Model Reviewer Skill

## 觸發時機

- 規格變更後需要驗證一致性
- Pull Request 審查階段
- CI/CD Pipeline 中的品質門檻
- 開發人員請求全面審查時

## 核心任務

透過多模型交叉驗證，確保：
1. **Specification ↔ Program**：規格有的，程式必須實作
2. **Program ↔ Test**：程式有的，測試必須涵蓋
3. **Test ↔ Specification**：測試驗證的，規格必須定義

---

## 驗證三角形

```
        ┌─────────────────┐
        │  Specification  │
        │   (YAML Specs)  │
        └────────┬────────┘
                 │
    ┌────────────┼────────────┐
    │            │            │
    ▼            ▼            ▼
┌───────┐   ┌───────┐   ┌───────┐
│ Spec  │   │ Spec  │   │ Test  │
│  ==   │   │  ==   │   │  ==   │
│Program│   │ Test  │   │Program│
└───────┘   └───────┘   └───────┘
```

---

## 多模型架構

### 參與的 AI Agents

| # | Model | 呼叫方式 | 專長 |
|---|-------|----------|------|
| 1 | ChatGPT 5.2 | API (OpenAI) | 語意理解、邏輯推理 |
| 2 | Gemini | CLI (本地) | 多模態、長上下文 |
| 3 | Codex | CLI (本地) | 代碼生成、理解 |
| 4 | QWEN 32B | Local LLM | 中文理解、快速推理 |
| 5 | Claude | CLI (本地) | 規格分析、假警報過濾 |

### 審查流程

```
┌─────────────────────────────────────────────────────────────────┐
│                    Multi-Model Review Pipeline                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐                     │
│  │   Spec   │   │  Program │   │   Test   │                     │
│  │  (YAML)  │   │  (Code)  │   │  (BDD)   │                     │
│  └────┬─────┘   └────┬─────┘   └────┬─────┘                     │
│       │              │              │                            │
│       └──────────────┼──────────────┘                            │
│                      ▼                                           │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                   Parallel Review                          │  │
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────┐ │  │
│  │  │ChatGPT  │ │ Gemini  │ │  Codex  │ │  QWEN   │ │Claude│ │  │
│  │  │  5.2    │ │   CLI   │ │   CLI   │ │  32B    │ │ CLI  │ │  │
│  │  └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ └──┬───┘ │  │
│  │       │           │           │           │         │      │  │
│  │       └───────────┴───────────┴───────────┴─────────┘      │  │
│  │                           │                                 │  │
│  └───────────────────────────┼─────────────────────────────────┘  │
│                              ▼                                   │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │              Claude: False Positive Filter                 │  │
│  │                                                            │  │
│  │  • 交叉比對各模型發現                                       │  │
│  │  • 過濾假警報 (≥3 models agree = real issue)              │  │
│  │  • 分類嚴重程度                                            │  │
│  └───────────────────────────────────────────────────────────┘  │
│                              │                                   │
│                              ▼                                   │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    Final Report                            │  │
│  │                                                            │  │
│  │  ✅ PASS / ⚠️ WARNINGS / ❌ ERRORS                         │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

---

## 驗證項目

### 1. Specification → Program (規格 → 實作)

檢查規格定義的內容是否都有對應實作：

```yaml
checks:
  - id: SP1
    name: "Domain Events 實作完整性"
    rule: "frame.yaml 定義的 domain_events 必須在程式中有對應實作"
    example: "WorkflowCreated event defined but not published in CreateWorkflowUseCase"
  
  - id: SP2
    name: "Use Case 實作完整性"
    rule: "use-case.yaml 定義的 input/output 必須與實際 Service 一致"
  
  - id: SP3
    name: "Invariants 實作"
    rule: "aggregate.yaml 的 invariants 必須在 Aggregate 中有 enforced_in 對應的驗證"
  
  - id: SP4
    name: "Pre/Post Conditions"
    rule: "contracts 定義的 pre_conditions 必須在程式中有檢查"
```

### 2. Program → Test (實作 → 測試)

檢查實作的功能是否都有測試涵蓋：

```yaml
checks:
  - id: PT1
    name: "Use Case 測試覆蓋"
    rule: "每個 Service class 必須有對應的測試 class"
  
  - id: PT2
    name: "Domain Event 測試"
    rule: "程式發布的 Domain Events 必須在測試中驗證"
  
  - id: PT3
    name: "Error Path 測試"
    rule: "程式拋出的每種 Exception 必須有對應的測試案例"
  
  - id: PT4
    name: "Invariant 測試"
    rule: "Aggregate 的每個 invariant 必須有違反時的測試"
```

### 3. Test → Specification (測試 → 規格)

檢查測試驗證的內容是否都有規格定義：

```yaml
checks:
  - id: TS1
    name: "測試追溯性"
    rule: "每個測試案例必須能追溯到 acceptance.yaml 的 AC"
  
  - id: TS2
    name: "Frame Concerns 覆蓋"
    rule: "所有 frame_concerns 必須被測試涵蓋"
  
  - id: TS3
    name: "隱式行為"
    rule: "測試驗證的行為如果規格沒有定義，標記為潛在遺漏"
```

---

## 輸出格式

### 審查摘要報告

```
╔═══════════════════════════════════════════════════════════════════╗
║            DOMAIN EVENT STANDARD UPDATE SUMMARY                    ║
╠═══════════════════════════════════════════════════════════════════╣
║ #  │ Use Case              │ Event                │ Status        ║
╠════╪═══════════════════════╪══════════════════════╪═══════════════╣
║ 1  │ create-workflow       │ WorkflowCreated      │ ✅ DONE       ║
║ 2  │ create-stage          │ StageCreated         │ ✅ DONE       ║
║ 3  │ create-swimlane       │ SwimLaneCreated      │ ✅ DONE       ║
║ 4  │ copy-lane             │ LaneCopied           │ ✅ DONE       ║
║ 5  │ move-lane             │ LaneMoved            │ ✅ DONE       ║
║ 6  │ delete-lane           │ LaneDeleted          │ ✅ DONE       ║
║ 7  │ rename-lane           │ LaneRenamed          │ ✅ DONE       ║
║ 8  │ rename-workflow       │ WorkflowRenamed      │ ✅ DONE       ║
║ 9  │ delete-workflow       │ WorkflowDeleted      │ ✅ DONE       ║
║ 10 │ move-workflow         │ WorkflowMoved        │ ✅ DONE       ║
║ 11 │ set-wip-limit         │ WipLimitSet          │ ✅ DONE       ║
║ 12 │ change-workflow-note  │ WorkflowNoteChanged  │ ✅ DONE       ║
╠════╧═══════════════════════╧══════════════════════╧═══════════════╣
║ TOTAL: 12/12 (100%) ✅                                             ║
╚═══════════════════════════════════════════════════════════════════╝

變更摘要：
每個 aggregate.yaml 的 domain_events 區塊現在統一：
1. 新增 includes_standard: true
2. 新增 standard_ref: "../../../../shared/domain-event-standard.yaml"
3. 移除重複的 id 和 occurredOn 屬性
4. 新增 metadata 屬性的註解說明
5. 調整 workflowId 為第一個屬性（一致的排序）

共用標準檔案：
- /.dev/problem-frames/ezkanban/board-management/shared/domain-event-standard.yaml

這樣就解決了 multi-model review 發現的 metadata spec mismatch 問題，
所有 12 個 Workflow aggregate 的 domain events 現在都符合標準。
```

### 問題報告格式

```yaml
review_report:
  timestamp: "2025-12-31T10:30:00Z"
  spec_dir: "docs/specs/create-workflow/"
  
  summary:
    total_checks: 24
    passed: 22
    warnings: 1
    errors: 1
    
  issues:
    - id: ISSUE-001
      severity: error
      type: "spec_program_mismatch"
      description: "Domain event 'WorkflowCreated' missing 'metadata' property in spec"
      detected_by: ["chatgpt", "gemini", "claude"]  # 3/5 models
      confidence: high
      
      spec_location: "aggregate.yaml#domain_events.WorkflowCreated"
      program_location: "WorkflowEvents.java#WorkflowCreated"
      
      spec_definition: |
        properties:
          - workflowId
          - boardId
          - name
          
      program_implementation: |
        record WorkflowCreated(
            WorkflowId workflowId,
            BoardId boardId,
            String name,
            EventMetadata metadata  // ← Missing in spec
        )
      
      suggested_fix: |
        Add 'metadata' property to aggregate.yaml:
        ```yaml
        domain_events:
          - name: WorkflowCreated
            includes_standard: true
            standard_ref: "../shared/domain-event-standard.yaml"
        ```
    
    - id: ISSUE-002
      severity: warning
      type: "test_coverage_gap"
      description: "No test for 'WorkflowCreated' event publication"
      detected_by: ["codex", "qwen"]  # 2/5 models - warning level
      confidence: medium
      
      test_location: "CreateWorkflowAcceptanceTest.java"
      suggestion: "Add assertion for event publication in ThenSuccess block"
```

---

## 假警報過濾規則

### 共識閾值

```yaml
consensus_rules:
  error:
    threshold: 3  # ≥3 models agree = confirmed error
    action: "report as error"
  
  warning:
    threshold: 2  # 2 models agree = warning
    action: "report as warning"
  
  ignored:
    threshold: 1  # only 1 model = likely false positive
    action: "log but don't report"
```

### Claude 最終裁決

Claude 作為最終審查者，負責：

1. **語意分析**：理解各模型的發現是否指向同一問題
2. **上下文判斷**：考慮專案特定的慣例
3. **嚴重程度分類**：根據影響範圍分級
4. **建議生成**：提供可執行的修復建議

---

## 工具腳本 (scripts/)

### multi_model_review.py

```bash
# 執行多模型審查
python ~/.claude/skills/multi-model-reviewer/scripts/multi_model_review.py \
    --spec-dir docs/specs/create-workflow/ \
    --program-dir src/application/workflow/ \
    --test-dir tests/acceptance/workflow/ \
    --output review-report.yaml

# 只驗證規格與程式
python ~/.claude/skills/multi-model-reviewer/scripts/multi_model_review.py \
    --spec-dir docs/specs/create-workflow/ \
    --program-dir src/application/workflow/ \
    --check spec-program

# 使用特定模型子集
python ~/.claude/skills/multi-model-reviewer/scripts/multi_model_review.py \
    --spec-dir docs/specs/create-workflow/ \
    --models chatgpt,claude,gemini
```

---

## 與其他 Skills 的協作

```
spec-compliance-validator
    │
    ├── 驗證單一規格完整性
    │
    └── 提供規格資料給 →
                        │
                        ▼
            multi-model-reviewer (本 Skill)
                        │
                        ├── 協調 5 個 AI Agents
                        ├── 交叉驗證 Spec == Program == Test
                        ├── Claude 過濾假警報
                        │
                        └── 輸出報告給 →
                                        │
                                        ▼
                              code-reviewer
                                        │
                                        └── 開發人員確認 → AI 修訂
```

---

## 配置檔案

### .multi-model-review.yaml

```yaml
# 專案根目錄配置
models:
  chatgpt:
    enabled: true
    api_key_env: "OPENAI_API_KEY"
    model: "gpt-5.2"
    
  gemini:
    enabled: true
    cli_command: "gemini"
    
  codex:
    enabled: true
    cli_command: "codex"
    
  qwen:
    enabled: true
    endpoint: "http://localhost:11434/api/generate"
    model: "qwen2.5:32b"
    
  claude:
    enabled: true
    cli_command: "claude"
    role: "final_arbiter"  # 最終裁決者

paths:
  specs: "docs/specs/"
  source: "src/"
  tests: "tests/"

shared_standards:
  domain_events: "shared/domain-event-standard.yaml"
  
consensus:
  error_threshold: 3
  warning_threshold: 2
```

---

## 開發人員工作流程

```
1. 開發完成
       │
       ▼
2. 執行 multi-model-review
       │
       ▼
3. 收到報告
   ├── ✅ PASS → 提交 PR
   │
   └── ❌ ISSUES FOUND
           │
           ▼
4. 開發人員確認
   ├── 真問題 → 請 AI 修訂
   │              │
   │              ▼
   │         5. AI 自動修復
   │              │
   │              ▼
   │         6. 重新驗證 → 回到步驟 2
   │
   └── 假警報 → 標記忽略規則
```

### 效益

- **減少人工審查時間**：只需確認 AI 發現的問題
- **提高一致性**：多模型交叉驗證降低漏檢率
- **自動修復**：確認問題後 AI 自動生成修復方案
- **標準化**：建立共用標準檔案，統一規格格式

Related Skills

math-modeling

from diegosouzapw/awesome-omni-skill

本技能应在用户要求"数学建模"、"建模比赛"、"数模论文"、"数学建模竞赛"、"建模分析"、"建模求解"或提及数学建模相关任务时使用。适用于全国大学生数学建模竞赛(CUMCM)、美国大学生数学建模竞赛(MCM/ICM)等各类数学建模比赛。

ios-foundation-models-diag

from diegosouzapw/awesome-omni-skill

Use when debugging Foundation Models issues — context exceeded, guardrail violations, slow generation, availability problems, unsupported language, or unexpected output. Systematic diagnostics with production crisis defense.

fair-data-model-assessment

from diegosouzapw/awesome-omni-skill

Assess data models against FAIR principles using RDA-FDMM indicators. Use when: (1) Evaluating vendor-delivered data models for FAIR compliance, (2) Reviewing schemas, ontologies, or data dictionaries before integration, (3) Creating FAIR assessment reports for data governance reviews, (4) Preparing data model documentation for enterprise or regulatory standards, (5) Auditing existing data assets for FAIRness gaps. Covers 41 RDA indicators across Findable, Accessible, Interoperable, Reusable dimensions with maturity scoring (0-4 scale).

ethics-reviewer

from diegosouzapw/awesome-omni-skill

This skill should be used when the user mentions "dark patterns", "accessibility", "a11y", "privacy", "tracking", "analytics", "notifications", "user data", "GDPR", "consent", "manipulation", "sustainability", "performance budget", or when building user-facing features that collect data, send notifications, display urgency, or gate access. Addresses ethical constraints in software design — manipulation, accessibility, privacy, and sustainability.

error-debugging-multi-agent-review

from diegosouzapw/awesome-omni-skill

Use when working with error debugging multi agent review

data-model

from diegosouzapw/awesome-omni-skill

Generate comprehensive data model documentation with ERD, DTOs, and data flow diagrams

data-model-creation

from diegosouzapw/awesome-omni-skill

Professional rules for AI-driven data modeling and creation. Use this skill when users need to create and manage MySQL databases, design data models using Mermaid ER diagrams, and implement database schemas.

codex-reviewer

from diegosouzapw/awesome-omni-skill

Use OpenAI's Codex CLI as an independent code reviewer to provide second opinions on code implementations, architectural decisions, code specifications, and pull requests. Trigger when users request code review, second opinion, independent review, architecture validation, or mention Codex review. Provides unbiased analysis using GPT-5-Codex model through the codex exec command for non-interactive reviews.

code-reviewer

from diegosouzapw/awesome-omni-skill

Elite code review expert specializing in modern AI-powered code analysis, security vulnerabilities, performance optimization, and production reliability. Masters static analysis tools, security scanning, and configuration review with 2024/2025 best practices. Use PROACTIVELY for code quality assurance.

Build Your Model Serving Skill

from diegosouzapw/awesome-omni-skill

Create your model-serving skill from Ollama documentation before learning deployment theory

Build Your Model Merging Skill

from diegosouzapw/awesome-omni-skill

No description provided.

banking-domain-reviewer

from diegosouzapw/awesome-omni-skill

Code review agent with banking domain knowledge — validates business flows, compliance requirements, double-entry accounting, payment processing, and regulatory patterns in the Firefly Banking Platform