dbt-migration-validation

Comprehensive validation skill for dbt models and schema YAML files. Defines validation rules, common anti-patterns to detect, and auto-fix suggestions. Integrates with Claude Code hooks to enforce quality standards during migration.

31 stars

bysfc-gh-dflippo

View on GitHub Installation ↓

Best use case

dbt-migration-validation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using dbt-migration-validation should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/dbt-migration-validation/SKILL.md --create-dirs "https://raw.githubusercontent.com/sfc-gh-dflippo/snowflake-dbt-demo/main/.claude/skills/dbt-migration-validation/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/dbt-migration-validation/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How dbt-migration-validation Compares

Feature / Agent	dbt-migration-validation	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# dbt Migration Validation Skill

## Purpose

Define and enforce validation rules for dbt models during migration. This skill provides
comprehensive validation rules, common anti-patterns to detect, and auto-fix suggestions that are
implemented by the validation hooks.

## When to Use This Skill

Activate this skill when:

- Reviewing dbt models for quality issues
- Diagnosing validation hook failures
- Understanding validation rules and their rationale
- Looking up auto-fix suggestions for common issues
- Configuring validation thresholds

---

## Validation Rules Reference

### Schema YAML Rules

#### YAML001: Model Description Required

**Severity:** Error

**Description:** Every model must have a description in the schema YAML file.

**Rationale:** Descriptions are essential for documentation, lineage understanding, and team
collaboration. They appear in dbt docs and help stakeholders understand data assets.

**Detection:** Model entry exists but `description` field is missing or empty.

**Fix:**

```yaml
## Before
models:
  - name: dim_customers
    columns:
      - name: customer_id

## After
models:
  - name: dim_customers
    description: |
      Customer dimension containing customer attributes, contact info,
      and segmentation. Updated daily from CRM system.

      Source: CRM Database
      Owner: Analytics Team
    columns:
      - name: customer_id
```

---

#### YAML002: Primary Key Test Required

**Severity:** Error

**Description:** Columns that appear to be primary keys (ending in `_id`, `_key`, `_sk`) must have a
`dbt_constraints.primary_key` test.

**Rationale:** Primary key validation ensures data integrity. The `dbt_constraints` package creates
actual database constraints for validation.

**Detection:** Column name matches primary key pattern but lacks required test.

**Primary Key Patterns:**

- `*_id` (e.g., `customer_id`, `order_id`)
- `*_key` (e.g., `surrogate_key`, `natural_key`)
- `*_sk` (e.g., `customer_sk`)
- `id` (exact match)

**Fix:**

```yaml
## Before
columns:
  - name: customer_id
    description: "Unique customer identifier"

## After
columns:
  - name: customer_id
    description: "Unique customer identifier"
    data_type: integer
    tests:
      - dbt_constraints.primary_key
```

---

#### YAML003: Foreign Key Relationship Test

**Severity:** Warning

**Description:** Columns that appear to be foreign keys should have a `relationships` or
`dbt_constraints.foreign_key` test.

**Rationale:** Relationship tests validate referential integrity between tables.

**Detection:** Column name matches foreign key pattern but lacks relationship test.

**Foreign Key Patterns:**

- `fk_*` (e.g., `fk_customer`)
- `*_fk` (e.g., `customer_fk`)

**Fix:**

```yaml
## Before
columns:
  - name: fk_customer_id
    description: "Reference to customer"

## After
columns:
  - name: fk_customer_id
    description: "Reference to customer"
    tests:
      - relationships:
          to: ref('dim_customers')
          field: customer_id
```

---

#### YAML004: Column Description

**Severity:** Warning

**Description:** All columns should have descriptions.

**Rationale:** Column descriptions improve documentation and make data assets more discoverable.

**Fix:**

```yaml
## Before
columns:
  - name: signup_date

## After
columns:
  - name: signup_date
    description: "Date the customer signed up for service"
```

---

#### YAML005: Model Naming Convention

**Severity:** Error

**Description:** Model names must follow layer-specific naming conventions.

**Conventions by Layer:**

| Layer               | Prefix                          | Pattern                       | Example                      |
| ------------------- | ------------------------------- | ----------------------------- | ---------------------------- |
| Bronze/Staging      | `stg_`                          | `stg_{source}__{table}`       | `stg_sqlserver__customers`   |
| Silver/Intermediate | `int_`, `lookup_`               | `int_{entity}__{description}` | `int_customers__with_orders` |
| Gold/Mart           | `dim_`, `fct_`, `mart_`, `agg_` | `dim_{entity}`                | `dim_customers`              |

**Fix:**

```yaml
## Before (in gold layer)
models:
  - name: customers

## After
models:
  - name: dim_customers
```

---

#### YAML006: Column Data Type

**Severity:** Warning

**Description:** All columns should have `data_type` specified.

**Rationale:** Explicit data types improve documentation and enable contract testing.

**Fix:**

```yaml
## Before
columns:
  - name: amount
    description: "Order amount"

## After
columns:
  - name: amount
    description: "Order amount in USD"
    data_type: number(18,2)
```

---

### SQL Model Rules

#### SQL001: Config Block

**Severity:** Warning

**Description:** Models should have a config block specifying materialization.

**Rationale:** Explicit configuration makes materialization strategy clear and enables per-model
customization.

**Fix:**

```sql
-- Before
select * from {{ ref('stg_customers') }}

-- After
{{ config(
    materialized='table',
    tags=['gold', 'customer']
) }}

select * from {{ ref('stg_customers') }}
```

---

#### SQL002: CTE Pattern

**Severity:** Warning

**Description:** Models should use the standard CTE pattern for readability.

**Pattern:**

1. Import CTEs - Reference source data
2. Logical CTEs - Apply transformations
3. Final CTE - Prepare output
4. Final SELECT from final CTE

**Fix:**

```sql
-- Before
select
    c.customer_id,
    c.customer_name,
    count(o.order_id) as order_count
from {{ ref('stg_customers') }} c
left join {{ ref('stg_orders') }} o on c.customer_id = o.customer_id
group by 1, 2

-- After
with customers as (
    select * from {{ ref('stg_customers') }}
),

orders as (
    select * from {{ ref('stg_orders') }}
),

customer_orders as (
    select
        c.customer_id,
        c.customer_name,
        count(o.order_id) as order_count
    from customers c
    left join orders o on c.customer_id = o.customer_id
    group by c.customer_id, c.customer_name
),

final as (
    select
        customer_id,
        customer_name,
        order_count
    from customer_orders
)

select * from final
```

---

#### SQL003: No SELECT \* in Final Output

**Severity:** Error

**Description:** The final query should explicitly list columns, not use `SELECT *`.

**Rationale:** Explicit columns make the contract clear, prevent accidental exposure of new columns,
and improve query performance.

**Exception:** `SELECT * FROM final` is acceptable when `final` CTE explicitly lists columns.

**Fix:**

```sql
-- Before (problematic)
select * from {{ ref('stg_customers') }}

-- After
select
    customer_id,
    customer_name,
    email,
    signup_date
from {{ ref('stg_customers') }}
```

---

#### SQL004: Use ref() and source()

**Severity:** Error

**Description:** All table references must use `{{ ref() }}` or `{{ source() }}`, not hardcoded
table names.

**Rationale:** Using ref() and source() enables:

- Automatic dependency tracking
- Environment-aware table resolution
- Proper lineage documentation

**Detection:** Pattern `FROM schema.table` or `JOIN database.schema.table` without Jinja braces.

**Fix:**

```sql
-- Before
select * from raw_data.customers

-- After (for source tables)
select * from {{ source('raw_data', 'customers') }}

-- After (for dbt models)
select * from {{ ref('stg_customers') }}
```

---

#### SQL005: Migration Header Comment

**Severity:** Warning

**Description:** Migrated models should include a header comment documenting the original source and
conversion notes.

**Detection:** Model contains migration indicators but lacks proper header.

**Required Header Elements:**

- Original object name and schema
- Source platform
- Migration date
- Conversion notes
- Breaking changes (if any)

**Template:**

```sql
/* Original Object: {schema}.{object_name}
   Source Platform: {SQL Server|Oracle|Teradata|etc.}
   Original Type: {Stored Procedure|View|Function}
   Migration Date: YYYY-MM-DD

   Conversion Notes:
   - Replaced ISNULL() with COALESCE()
   - Converted TOP to LIMIT
   - Removed NOLOCK hints

   Breaking Changes:
   - Output column 'old_name' renamed to 'new_name'
*/

{{ config(materialized='table') }}

-- Model implementation
```

---

#### SQL006: Snowflake-Incompatible Syntax

**Severity:** Error

**Description:** Model contains SQL syntax that is not compatible with Snowflake.

**Common Issues:**

| Pattern              | Platform   | Snowflake Equivalent     |
| -------------------- | ---------- | ------------------------ |
| `TOP N`              | SQL Server | `LIMIT N`                |
| `ISNULL(a, b)`       | SQL Server | `COALESCE(a, b)`         |
| `GETDATE()`          | SQL Server | `CURRENT_TIMESTAMP()`    |
| `LEN(s)`             | SQL Server | `LENGTH(s)`              |
| `CHARINDEX(a, b)`    | SQL Server | `POSITION(a IN b)`       |
| `CONVERT(type, val)` | SQL Server | `CAST(val AS type)`      |
| `WITH (NOLOCK)`      | SQL Server | (remove)                 |
| `@@ROWCOUNT`         | SQL Server | (use different approach) |
| `ROWNUM`             | Oracle     | `ROW_NUMBER() OVER()`    |
| `DECODE(...)`        | Oracle     | `CASE WHEN...`           |
| `CONNECT BY`         | Oracle     | Recursive CTE            |
| `SYSDATE`            | Oracle     | `CURRENT_DATE()`         |
| `SEL`                | Teradata   | `SELECT`                 |
| Backticks            | MySQL      | Double quotes            |

**Fix Examples:**

```sql
-- SQL Server Before
SELECT TOP 100 *
FROM customers WITH (NOLOCK)
WHERE ISNULL(email, '') = ''
  AND GETDATE() > signup_date

-- Snowflake After
SELECT *
FROM customers
WHERE COALESCE(email, '') = ''
  AND CURRENT_TIMESTAMP() > signup_date
LIMIT 100
```

---

## Anti-Patterns to Detect

### Common Anti-Patterns

#### 1. Direct Source References

**Problem:** Using hardcoded table names instead of `source()` or `ref()`.

**Impact:** Breaks dependency tracking, environment portability, and lineage.

#### 2. Generic Column Names

**Problem:** Columns named `col1`, `field1`, `temp`, etc.

**Impact:** Poor documentation, confusing for consumers.

#### 3. Mixed Naming Conventions

**Problem:** Inconsistent casing or naming patterns within a model.

**Impact:** Confusion, maintenance difficulty.

#### 4. Missing Tests on Key Columns

**Problem:** Primary/foreign keys without uniqueness or relationship tests.

**Impact:** Data quality issues may go undetected.

#### 5. Overly Complex Models

**Problem:** Models with excessive CTEs, complex logic, or doing too much.

**Impact:** Hard to maintain, test, and understand.

#### 6. Platform-Specific Syntax

**Problem:** SQL syntax from source database that won't work in Snowflake.

**Impact:** Runtime errors, compilation failures.

---

## Hook Integration

### Validation Hook Configuration

Hooks are configured in `.claude/settings.local.json`:

```json
{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Write",
        "hooks": [
          {
            "type": "command",
            "command": "python3 .claude/hooks/dbt-validation/validate_file.py \"$FILE_PATH\"",
            "timeout": 30000
          }
        ]
      },
      {
        "matcher": "Edit",
        "hooks": [
          {
            "type": "command",
            "command": "python3 .claude/hooks/dbt-validation/validate_file.py \"$FILE_PATH\"",
            "timeout": 30000
          }
        ]
      }
    ]
  }
}
```

### Exit Codes

- **Exit 0**: Validation passed (or file not in scope)
- **Exit 1**: Validation failed with errors

Warnings are reported but don't cause exit code 1.

### File Scope

Validation runs only on files matching:

- `models/**/_models.yml` - Schema YAML validation
- `models/**/_sources.yml` - Source YAML validation
- `models/**/*.sql` - SQL model validation

Other files are skipped (exit 0).

---

## Validation Scripts

Validation is implemented in `.claude/hooks/dbt-validation/`:

| Script                        | Purpose                                      |
| ----------------------------- | -------------------------------------------- |
| `validate_file.py`            | Entry point, routes to appropriate validator |
| `validate_schema_yaml.py`     | YAML rule validation                         |
| `validate_dbt_model.py`       | SQL rule validation                          |
| `check_migration_status.py`   | Cross-validation and reporting               |
| `rules/naming_conventions.py` | Naming convention checks                     |
| `rules/cte_patterns.py`       | CTE structure validation                     |
| `rules/snowflake_syntax.py`   | Platform syntax detection                    |

---

## Related Skills

- $dbt-migration - Migration workflow
- $dbt-testing - Test strategies
- $dbt-architecture - Naming conventions
- $dbt-modeling - CTE patterns

Related Skills

dbt-migration

from sfc-gh-dflippo/snowflake-dbt-demo

Complete workflow for migrating database tables, views, and stored procedures to dbt projects on Snowflake. Orchestrates discovery, planning, placeholder creation, view/procedure conversion, testing, and deployment. Delegates platform-specific syntax translation to source-specific skills.

dbt-migration-vertica

from sfc-gh-dflippo/snowflake-dbt-demo

Convert Vertica DDL to dbt models compatible with Snowflake. This skill should be used when converting views, tables, or stored procedures from Vertica to dbt code, generating schema.yml files with tests and documentation, or migrating Vertica SQL to follow dbt best practices.

dbt-migration-teradata

from sfc-gh-dflippo/snowflake-dbt-demo

Convert Teradata DDL to dbt models compatible with Snowflake. This skill should be used when converting views, tables, or stored procedures from Teradata to dbt code, generating schema.yml files with tests and documentation, or migrating Teradata SQL to follow dbt best practices.

dbt-migration-sybase

from sfc-gh-dflippo/snowflake-dbt-demo

Convert Sybase IQ DDL to dbt models compatible with Snowflake. This skill should be used when converting views, tables, or stored procedures from Sybase IQ to dbt code, generating schema.yml files with tests and documentation, or migrating Sybase SQL to follow dbt best practices.

dbt-migration-snowflake

from sfc-gh-dflippo/snowflake-dbt-demo

Convert Snowflake DDL to dbt models. This skill should be used when converting views, tables, or stored procedures from Snowflake to dbt code, generating schema.yml files with tests and documentation, or migrating existing Snowflake SQL to follow dbt best practices.

dbt-migration-redshift

from sfc-gh-dflippo/snowflake-dbt-demo

Convert Amazon Redshift DDL to dbt models compatible with Snowflake. This skill should be used when converting views, tables, or stored procedures from Redshift to dbt code, generating schema.yml files with tests and documentation, or migrating Redshift SQL to follow dbt best practices.

dbt-migration-postgres

from sfc-gh-dflippo/snowflake-dbt-demo

Convert PostgreSQL/Greenplum/Netezza DDL to dbt models compatible with Snowflake. This skill should be used when converting views, tables, or stored procedures from PostgreSQL, Greenplum, or Netezza to dbt code, generating schema.yml files with tests and documentation, or migrating PostgreSQL SQL to follow dbt best practices.

dbt-migration-oracle

from sfc-gh-dflippo/snowflake-dbt-demo

Convert Oracle DDL to dbt models compatible with Snowflake. This skill should be used when converting views, tables, or stored procedures from Oracle to dbt code, generating schema.yml files with tests and documentation, or migrating Oracle PL/SQL to follow dbt best practices.

dbt-migration-ms-sql-server

from sfc-gh-dflippo/snowflake-dbt-demo

Convert Microsoft SQL Server/Azure Synapse T-SQL DDL to dbt models compatible with Snowflake. This skill should be used when converting views, tables, or stored procedures from SQL Server to dbt code, generating schema.yml files with tests and documentation, or migrating T-SQL to follow dbt best practices.

dbt-migration-hive

from sfc-gh-dflippo/snowflake-dbt-demo

Convert Hive/Spark/Databricks DDL to dbt models compatible with Snowflake. This skill should be used when converting views, tables, or UDFs from Hive, Spark, or Databricks to dbt code, generating schema.yml files with tests and documentation, or migrating HiveQL to follow dbt best practices.

dbt-migration-db2

from sfc-gh-dflippo/snowflake-dbt-demo

Convert IBM DB2 DDL to dbt models compatible with Snowflake. This skill should be used when converting views, tables, or stored procedures from DB2 to dbt code, generating schema.yml files with tests and documentation, or migrating DB2 SQL to follow dbt best practices.

dbt-migration-bigquery

from sfc-gh-dflippo/snowflake-dbt-demo

Convert Google BigQuery DDL to dbt models compatible with Snowflake. This skill should be used when converting views, tables, or stored procedures from BigQuery to dbt code, generating schema.yml files with tests and documentation, or migrating BigQuery SQL to follow dbt best practices.