skill-test

Manage the skills-for-fabric evaluation framework: add eval plans for new or existing skills, list available tests and their results, generate eval datasets, review metrics, and check test coverage. Directs test execution to the tests/ folder. Triggers: "add tests", "add evals", "list tests", "show eval results", "run tests", "generate eval data", "eval metrics", "test coverage", "missing tests". "show tests"

245 stars

bymicrosoft

View on GitHub Installation ↓

Best use case

skill-test is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using skill-test should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/skill-test/SKILL.md --create-dirs "https://raw.githubusercontent.com/microsoft/skills-for-fabric/main/.github/skills/skill-test/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/skill-test/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How skill-test Compares

Feature / Agent	skill-test	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# Skill Test — skills-for-fabric Evaluation Framework

Manage the end-to-end evaluation framework for skills-for-fabric. This skill routes requests to the correct workflow based on user intent — adding tests, listing tests, running tests, viewing results, generating data, or checking coverage.

## When to Use

- When a contributor wants to add evaluation test cases for a new or existing skill
- When someone asks to see what tests exist or what results look like
- When a user wants to run the test suite
- When reviewing eval metrics or checking which skills lack test coverage

## Intent Routing

Parse the user request and route to the appropriate workflow:

| User Intent | Trigger Phrases | Action |
|-------------|----------------|--------|
| **Add evals** | "add tests", "add evals", "add evals for missing skills", "create eval plan" | → [Workflow: Add Evals](#workflow-add-evals) |
| **List tests** | "list tests", "list evals", "show me the list of tests", "what tests exist", "show eval plans" | → [Workflow: List Tests](#workflow-list-tests) |
| **Run tests** | "run tests", "run evals", "execute tests", "run the eval suite" | → [Workflow: Run Tests](#workflow-run-tests) |
| **View results** | "show eval results", "test results", "eval results", "executive summary" | → [Workflow: View Results](#workflow-view-results) |
| **Generate data** | "generate eval data", "generate test data", "create eval datasets" | → [Workflow: Generate Data](#workflow-generate-data) |
| **View metrics** | "eval metrics", "test metrics", "what metrics", "how are tests scored" | → [Workflow: View Metrics](#workflow-view-metrics) |
| **Check coverage** | "test coverage", "which skills have tests", "missing tests", "skills without evals" | → [Workflow: Check Coverage](#workflow-check-coverage) |

---

## Workflow: Add Evals

Follow the instructions in `tests/full-eval-tests/README.md` § "Adding Evals for New Skills".

### Automated Path (Recommended)

Give the agent the prompt:

```
Add evals for the missing skills
```

The agent will:
1. Detect missing skills by comparing installed skills against existing eval plans in `tests/full-eval-tests/plan/03-individual-skills/`
2. Generate individual eval plans (`plan/03-individual-skills/eval-<skill-name>.md`) with 10–12 test cases
3. Generate combined eval plans (`plan/04-combined-skills/eval-<skill>-authoring-plus-consumption.md`)
4. Create golden data in `tests/full-eval-tests/evalsets/expected-results/`
5. Update tracking files: `plan/00-overview.md`, `README.md`, `plan/04-combined-skills/eval-full-pipeline.md`

### Manual Path

To add evals for a specific skill `<new-skill>`:

1. Create `tests/full-eval-tests/plan/03-individual-skills/eval-<new-skill>.md` using the template in the README
2. Each test case needs: Case ID (unique prefix), Prompt, Expected result, Pass criteria, at least one negative/ambiguous test
3. If the skill has an authoring+consumption pair, create `tests/full-eval-tests/plan/04-combined-skills/eval-<new-skill>-authoring-plus-consumption.md`
4. Add golden data to `tests/full-eval-tests/evalsets/expected-results/`
5. Update `plan/00-overview.md`, `README.md` directory tree, and `plan/04-combined-skills/eval-full-pipeline.md`

### Eval Plan Template

Use the template from `tests/full-eval-tests/README.md` § "Eval Plan Template". Every eval plan must include:
- Skill overview (name, category, R/W, purpose)
- Pre-requisites
- Numbered test cases (XX-01 through XX-10+) with Prompt / Expected / Pass criteria
- At least one negative/ambiguous test case as the last case
- Write Operations table (if the skill writes data)
- Expected Token Range

---

## Workflow: List Tests

Show the user what eval plans and test cases exist.

### Individual Skill Evals

List files in `tests/full-eval-tests/plan/03-individual-skills/`:

```bash
ls tests/full-eval-tests/plan/03-individual-skills/
```

### Combined Skill Evals

List files in `tests/full-eval-tests/plan/04-combined-skills/`:

```bash
ls tests/full-eval-tests/plan/04-combined-skills/
```

### Quick Tests (tests.json)

Show the test cases defined in `tests/tests.json` — these are the prompt-based tests run by the test runner.

### Recommended Execution Order

| Order | Eval Plan | Reason |
|-------|-----------|--------|
| 1 | eval-check-updates.md | Verify skills are installed |
| 2 | eval-spark-authoring.md | Create lakehouses and load data |
| 3 | eval-sqldw-authoring.md | Create warehouse tables and load data |
| 4 | eval-eventhouse-authoring.md | Create Eventhouse tables and ingest data |
| 5 | eval-spark-consumption.md | Read back lakehouse data |
| 6 | eval-sqldw-consumption.md | Read back warehouse data |
| 7 | eval-eventhouse-consumption.md | Read back Eventhouse data |
| 8 | eval-medallion.md | End-to-end medallion pipeline |

---

## Workflow: Run Tests

> **⛔ DO NOT execute tests from this skill.** The agent must NEVER run `copilot`, `run-full-tests.ps1`, or any eval prompt directly. Instead, tell the user the exact commands to run manually.

When the user asks to run tests, respond **only** with instructions. Do not execute any commands. Tell the user:

1. Open a terminal and navigate to the `tests/` directory at the repository root:
   ```powershell
   cd tests
   ```

2. Run the full test suite:
   ```powershell
   .\run-full-tests.ps1
   ```

3. To specify an output directory:
   ```powershell
   .\run-full-tests.ps1 -TestFolder C:\temp\eval-run-01
   ```

### Important

- **The agent must NEVER run tests itself** — only provide the user with instructions
- **Tests must be run by the user** from inside the `tests/` folder
- The script copies the eval framework to a working folder and launches copilot there

---

## Workflow: View Results

Show the user existing evaluation results.

### Detailed Results

Read `tests/full-eval-tests/eval-results.md` — contains per-skill, per-test-case pass/fail with notes, consistency test results, failure analysis, and skip reasons.

### Executive Summary

Read `tests/full-eval-tests/executive-summary.md` — contains the high-level summary: overall pass rate, results by skill, data consistency scores, failure analysis, and recommendations.

### Key Metrics from Latest Run

| Metric | Value |
|--------|-------|
| Overall pass rate | 94.7% (54/57 executed) |
| Write/Read consistency | 100% (5/5 exact matches) |
| Total test cases | 74 |
| Skipped | 17 |

---

## Workflow: Generate Data

Generate synthetic evaluation datasets using the specifications in `tests/full-eval-tests/plan/01-data-generation.md`.

### Using the Generation Script

```bash
python tests/full-eval-tests/evalsets/data-generation/generate.py
```

### Datasets

| Dataset | Rows | Format | Used By |
|---------|------|--------|---------|
| sales_transactions | 100 / 1K / 10K | CSV | SQL DW, Spark |
| customers | 100 | CSV | Join testing |
| products | 50 | CSV | Join testing |
| sensor_readings | 500 | JSON | Spark semi-structured |

### Golden Results

Pre-computed expected results are in `tests/full-eval-tests/evalsets/expected-results/` and are used to verify consistency.

---

## Workflow: View Metrics

Explain the evaluation metrics defined in `tests/full-eval-tests/plan/02-metrics.md`.

| Metric | Definition |
|--------|-----------|
| **Success Rate** | `passed / total × 100` — whether the skill executed correctly |
| **Token Usage** | Input + output tokens consumed per eval prompt |
| **Read/Write Consistency** | Data written by authoring skill must be exactly retrievable by consumption skill |

### Grading

| Grade | Criteria |
|-------|----------|
| PASS | Skill invoked correctly, output matches expected |
| FAIL_INVOCATION | Wrong skill invoked or not invoked |
| FAIL_EXECUTION | Skill invoked but errored |
| FAIL_RESULT | Skill completed but output mismatches |

### Pass Thresholds

| Metric | Threshold |
|--------|-----------|
| Success Rate | ≥ 90% per skill |
| Token Usage | Within 2× of baseline |
| Read/Write Consistency | 100% exact match |

---

## Workflow: Check Coverage

Compare installed skills against existing eval plans to identify gaps.

### Steps

1. List all skills from the marketplace/plugin:
   ```
   check-updates, spark-authoring-cli, spark-consumption-cli, sqldw-authoring-cli,
   sqldw-consumption-cli, eventhouse-authoring-cli, eventhouse-consumption-cli, e2e-medallion-architecture
   ```

2. List existing individual eval plans:
   ```bash
   ls tests/full-eval-tests/plan/03-individual-skills/
   ```

3. Compare and report which skills have eval coverage and which are missing.

4. For missing skills, suggest running the [Add Evals](#workflow-add-evals) workflow.

---

## Must

- **NEVER execute tests, eval prompts, or the test runner script** — only provide instructions for the user to run manually
- **Always route "run tests" to the `tests/` folder** — tell the user to navigate there and run `run-full-tests.ps1`
- **Follow the eval plan template** when creating new eval plans — every test case needs Case ID, Prompt, Expected, Pass criteria
- **Include at least one negative/ambiguous test** in every new eval plan
- **Update tracking files** when adding evals — `plan/00-overview.md`, `README.md`, `eval-full-pipeline.md`
- **Reference golden data** in `evalsets/expected-results/` for consistency tests

## Prefer

- Automated eval generation ("add evals for missing skills") over manual creation
- Reading existing eval plans as templates before creating new ones
- Running the full suite rather than individual tests for comprehensive coverage
- Checking the executive summary before diving into detailed results

## Avoid

- Running eval prompts outside the `tests/` directory
- **Executing tests, copilot commands, or the run-full-tests.ps1 script** — the agent must only tell the user how to run them
- Creating eval plans without a negative/ambiguous test case
- Duplicating content already in the README or plan documents — reference them instead
- Modifying golden result files unless the underlying data generation rules change
- Skipping the tracking file updates when adding new evals

## Examples

### Adding Tests

**User:** "Add tests for the powerbi-consumption-cli skill"

**Agent:** Creates `tests/full-eval-tests/plan/03-individual-skills/eval-powerbi-consumption.md` with 10–12 test cases following the template, adds golden data to `evalsets/expected-results/`, and updates tracking files.

### Listing Tests

**User:** "What tests exist?"

**Agent:** Lists all eval plans in `plan/03-individual-skills/` and `plan/04-combined-skills/`, plus the quick tests in `tests/tests.json`, with the recommended execution order.

### Running Tests

**User:** "Run the tests"

**Agent:**
> Tests must be run from the `tests/` folder. Navigate there and execute:
> ```powershell
> cd tests
> .\run-full-tests.ps1
> ```

### Viewing Results

**User:** "Show me the eval results"

**Agent:** Reads `tests/full-eval-tests/eval-results.md` and presents the summary table, highlighting pass rates and any failures.

Related Skills

sqldw-consumption-cli

245

from microsoft/skills-for-fabric

Execute read-only T-SQL queries against Fabric Data Warehouse, Lakehouse SQL Endpoints, and Mirrored Databases via CLI. Default skill for any lakehouse data query (row counts, SELECT, filtering, aggregation) unless the user explicitly requests PySpark or Spark DataFrames. Use when the user wants to: (1) query warehouse/lakehouse data, (2) count rows or explore lakehouse tables, (3) discover schemas/columns, (4) generate T-SQL scripts, (5) monitor SQL performance, (6) export results to CSV/JSON. Triggers: "warehouse", "SQL query", "T-SQL", "query warehouse", "show warehouse tables", "show lakehouse tables", "query lakehouse", "lakehouse table", "how many rows", "count rows", "SQL endpoint", "describe warehouse schema", "generate T-SQL script", "warehouse performance", "export SQL data", "connect to warehouse", "lakehouse data", "explore lakehouse".

sqldw-authoring-cli

245

from microsoft/skills-for-fabric

Execute authoring T-SQL (DDL, DML, data ingestion, transactions, schema changes) against Microsoft Fabric Data Warehouse and SQL endpoints from agentic CLI environments. Use when the user wants to: (1) create/alter/drop tables from terminal, (2) insert/update/delete/merge data via CLI, (3) run COPY INTO or OPENROWSET ingestion, (4) manage transactions or stored procedures, (5) perform schema evolution, (6) use time travel or snapshots, (7) generate ETL/ELT shell scripts, (8) create views/functions/procedures on Lakehouse SQLEP. Triggers: "create table in warehouse", "insert data via T-SQL", "load from ADLS", "COPY INTO", "run ETL with T-SQL", "alter warehouse table", "upsert with T-SQL", "merge into warehouse", "create T-SQL procedure", "warehouse time travel", "recover deleted warehouse data", "create warehouse schema", "deploy warehouse", "transaction conflict", "snapshot isolation error".

spark-consumption-cli

245

from microsoft/skills-for-fabric

Analyze lakehouse data interactively using Fabric Livy sessions and PySpark/Spark SQL for advanced analytics, DataFrames, cross-lakehouse joins, Delta time-travel, and unstructured/JSON data. Use when the user explicitly asks for PySpark, Spark DataFrames, Livy sessions, or Python-based analysis — NOT for simple SQL queries. Triggers: "PySpark", "Spark SQL", "analyze with PySpark", "Spark DataFrame", "Livy session", "lakehouse with Python", "PySpark analysis", "PySpark data quality", "Delta time-travel with Spark".

spark-authoring-cli

245

from microsoft/skills-for-fabric

Develop Microsoft Fabric Spark/data engineering workflows with intelligent routing to specialized resources. Provides core workspace/lakehouse management and routes to: data engineering patterns, development workflow, or infrastructure orchestration. Use when the user wants to: (1) manage Fabric workspaces and resources, (2) develop notebooks and PySpark applications, (3) design data pipelines and orchestration, (4) provision infrastructure as code. Triggers: "develop notebook", "data engineering", "workspace setup", "pipeline design", "infrastructure provisioning", "Delta Lake patterns", "Spark development", "lakehouse configuration", "organize lakehouse tables", "create Livy session", "notebook deployment".

powerbi-consumption-cli

245

from microsoft/skills-for-fabric

The ONLY supported path for read-only Microsoft Fabric Power BI semantic model (formerly "Power BI dataset") query interactions. Execute DAX queries via the MCP server ExecuteQuery tool to: (1) discover semantic model metadata (tables, columns, measures, relationships, hierarchies, etc.) and their properties, (2) retrieve data from a semantic model. Triggers: "DAX query", "semantic model metadata", "list semantic model tables", "run EVALUATE", "get measure expression".

powerbi-authoring-cli

245

from microsoft/skills-for-fabric

Create, manage, and deploy Power BI semantic models inside Microsoft Fabric workspaces via `az rest` CLI against Fabric and Power BI REST APIs. Use when the user wants to: (1) create a semantic model from TMDL definition files, (2) retrieve or download semantic model definitions, (3) update a semantic model definition with modified TMDL, (4) trigger or manage dataset refresh operations, (5) configure data sources, parameters, or permissions, (6) deploy semantic models between pipeline stages. Covers Fabric Items API (CRUD) and Power BI Datasets API (refresh, data sources, permissions). For read-only DAX queries, use `powerbi-consumption-cli`. For fine-grained modeling changes, route to `powerbi-modeling-mcp`. Triggers: "create semantic model", "upload TMDL", "download semantic model TMDL", "refresh dataset", "semantic model deployment pipeline", "dataset permissions", "list dataset users", "semantic model authoring".

eventhouse-consumption-cli

245

from microsoft/skills-for-fabric

Run KQL queries against Fabric Eventhouse for real-time intelligence and time-series analytics using `az rest` against the Kusto REST API. Covers KQL operators (where, summarize, join, render), Eventhouse schema discovery (.show tables), time-series patterns with bin(), and ingestion monitoring. Use when the user wants to: 1. Run read-only KQL queries against an Eventhouse or KQL Database 2. Discover Eventhouse table schema and metadata 3. Analyse real-time or time-series data with KQL operators 4. Monitor ingestion health and active KQL queries 5. Export KQL results to JSON Triggers: "kql query", "kusto query", "eventhouse query", "kql database", "real-time intelligence", "time-series kql", "query eventhouse", "explore eventhouse", "show tables kql"

eventhouse-authoring-cli

245

from microsoft/skills-for-fabric

Execute KQL management commands (table management, ingestion, policies, functions, materialized views) against Fabric Eventhouse and KQL Databases via CLI. Use when the user wants to: 1. Create or alter KQL tables, columns, or functions 2. Ingest data into an Eventhouse (inline, from storage, streaming) 3. Configure retention, caching, or partitioning policies 4. Create or manage materialized views and update policies 5. Manage data mappings for ingestion pipelines 6. Deploy KQL schema via scripts Triggers: "create kql table", "kql ingestion", "ingest into eventhouse", "kql function", "materialized view", "kql retention policy", "eventhouse schema", "kql authoring", "create eventhouse table", "kql mapping"

e2e-medallion-architecture

245

from microsoft/skills-for-fabric

Implement end-to-end Medallion Architecture (Bronze/Silver/Gold) lakehouse patterns in Microsoft Fabric using PySpark, Delta Lake, and Fabric Pipelines. Use when the user wants to: (1) design a Bronze/Silver/Gold data lakehouse, (2) set up multi-layer workspace with lakehouses for each tier, (3) build ingestion-to-analytics pipelines with data quality enforcement, (4) optimize Spark configurations per medallion layer, (5) orchestrate Bronze-to-Silver-to-Gold flows via notebooks. Triggers: "medallion architecture", "bronze silver gold", "lakehouse layers", "e2e data pipeline", "end-to-end lakehouse", "data lakehouse pattern", "multi-layer lakehouse", "build medallion", "setup medallion".

check-updates

245

from microsoft/skills-for-fabric

Check for skills-for-fabric marketplace updates at session start. Compares local version against GitHub releases and shows changelog if updates are available. Use when the user wants to: (1) check for skill updates, (2) see what's new in skills-for-fabric, (3) verify current version. Triggers: "check for updates", "am I up to date", "what version", "update skills", "show changelog".

quality-check

245

from microsoft/skills-for-fabric

Run local quality checks on skills-for-fabric before committing. Validates all skills in the skills/ folder for structural compliance, semantic disambiguation, broken references, and content quality. Use before submitting a PR to catch issues early. Triggers: "check my skills", "run quality check", "validate skills", "pre-commit check", "lint skills".

best-practices-check

245

from microsoft/skills-for-fabric

Verify skills-for-fabric against Microsoft Fabric best practices from the internet. Searches for current best practices, compares them against skill content, and identifies gaps or improvements. Use when the user wants to: (1) validate a skill covers industry best practices, (2) find missing guidance, (3) improve skill quality with current recommendations. Triggers: "check best practices", "validate best practices", "best practices for", "compare against best practices", "skill coverage".