dbt-project-analyzer

Analyzes dbt projects for best practices, performance, maintainability, and generates actionable recommendations for improvement.

509 stars

bya5c-ai

View on GitHub Installation ↓

Best use case

dbt-project-analyzer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Analyzes dbt projects for best practices, performance, maintainability, and generates actionable recommendations for improvement.

Teams using dbt-project-analyzer should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/dbt-project-analyzer/SKILL.md --create-dirs "https://raw.githubusercontent.com/a5c-ai/babysitter/main/library/specializations/data-engineering-analytics/skills/dbt-project-analyzer/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/dbt-project-analyzer/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How dbt-project-analyzer Compares

Feature / Agent	dbt-project-analyzer	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Analyzes dbt projects for best practices, performance, maintainability, and generates actionable recommendations for improvement.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# dbt Project Analyzer

Analyzes dbt projects for best practices, performance, and maintainability following dbt Labs recommended patterns.

## Overview

This skill examines dbt project structure, model dependencies, test coverage, documentation completeness, and adherence to naming conventions. It provides actionable recommendations for improving project health and maintainability.

## Capabilities

- **Model dependency graph analysis** - Visualize and analyze model relationships, detect circular dependencies
- **Incremental model optimization** - Evaluate incremental strategies and suggest improvements
- **Materialization strategy recommendations** - Recommend optimal materializations based on usage patterns
- **Test coverage analysis** - Measure and report on test coverage across models
- **Documentation completeness check** - Identify undocumented models, columns, and sources
- **Naming convention validation** - Enforce consistent naming patterns (staging, marts, intermediate)
- **Ref/source usage validation** - Detect hardcoded references and missing source definitions
- **Macro efficiency analysis** - Evaluate macro usage and suggest optimizations
- **Slim CI optimization** - Configure efficient CI builds with state comparison
- **Model contract validation** - Verify model contracts for type safety

## Input Schema

```json
{
  "projectPath": {
    "type": "string",
    "description": "Path to the dbt project root directory",
    "required": true
  },
  "manifestJson": {
    "type": "object",
    "description": "Parsed manifest.json from target/ directory (optional, will be loaded if not provided)"
  },
  "catalogJson": {
    "type": "object",
    "description": "Parsed catalog.json from target/ directory (optional)"
  },
  "runResults": {
    "type": "object",
    "description": "Parsed run_results.json for performance analysis (optional)"
  },
  "analysisDepth": {
    "type": "string",
    "enum": ["quick", "standard", "deep"],
    "default": "standard",
    "description": "Depth of analysis to perform"
  },
  "focusAreas": {
    "type": "array",
    "items": {
      "type": "string",
      "enum": ["performance", "testing", "documentation", "naming", "incremental", "dependencies"]
    },
    "description": "Specific areas to focus analysis on (all if not specified)"
  }
}
```

## Output Schema

```json
{
  "healthScore": {
    "type": "number",
    "description": "Overall project health score (0-100)"
  },
  "issues": {
    "type": "array",
    "items": {
      "severity": "error|warning|info",
      "category": "string",
      "model": "string",
      "message": "string",
      "recommendation": "string",
      "line": "number"
    }
  },
  "metrics": {
    "testCoverage": {
      "type": "number",
      "description": "Percentage of models with tests"
    },
    "docCoverage": {
      "type": "number",
      "description": "Percentage of models/columns documented"
    },
    "incrementalRatio": {
      "type": "number",
      "description": "Percentage of eligible models using incremental"
    },
    "avgModelDepth": {
      "type": "number",
      "description": "Average depth in DAG"
    },
    "totalModels": {
      "type": "number"
    },
    "totalTests": {
      "type": "number"
    }
  },
  "recommendations": {
    "type": "array",
    "items": {
      "priority": "high|medium|low",
      "category": "string",
      "description": "string",
      "effort": "string",
      "impact": "string"
    }
  },
  "dependencyGraph": {
    "type": "object",
    "description": "Simplified dependency graph for visualization"
  }
}
```

## Usage Examples

### Basic Project Analysis

```bash
# Invoke skill for standard analysis
/skill dbt-project-analyzer --projectPath ./my-dbt-project
```

### Deep Analysis with Focus Areas

```json
{
  "projectPath": "./analytics",
  "analysisDepth": "deep",
  "focusAreas": ["performance", "testing", "incremental"]
}
```

### CI Integration Analysis

```json
{
  "projectPath": "./dbt_project",
  "manifestJson": "./target/manifest.json",
  "runResults": "./target/run_results.json",
  "focusAreas": ["performance"]
}
```

## Analysis Rules

### Naming Conventions

| Layer | Pattern | Example |
|-------|---------|---------|
| Staging | `stg_<source>__<entity>` | `stg_stripe__payments` |
| Intermediate | `int_<entity>_<verb>` | `int_payments_pivoted` |
| Marts | `fct_<entity>` or `dim_<entity>` | `fct_orders`, `dim_customers` |

### Test Coverage Requirements

| Severity | Condition |
|----------|-----------|
| Error | No unique/not_null test on primary key |
| Warning | < 50% columns have tests |
| Info | Missing relationship tests |

### Materialization Guidelines

| Model Type | Recommended | Reason |
|------------|-------------|--------|
| Staging | View or Ephemeral | Source transformations, low compute |
| Intermediate | Ephemeral | Reduce warehouse clutter |
| Marts | Table or Incremental | End-user queries, performance |
| Large tables (>1M rows) | Incremental | Reduce build time |

## Integration Points

### MCP Server Integration

This skill integrates with the official dbt MCP server for enhanced capabilities:

- **dbt-labs/dbt-mcp** - Project metadata discovery, model information, semantic layer querying
- **dbt Remote MCP Server** - Cloud-hosted dbt MCP with secure endpoint access

### Applicable Processes

- dbt Project Setup (`dbt-project-setup.js`)
- dbt Model Development (`dbt-model-development.js`)
- Metrics Layer (`metrics-layer.js`)
- Incremental Model Setup (`incremental-model.js`)

## References

- [dbt Best Practices](https://docs.getdbt.com/best-practices)
- [dbt Style Guide](https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md)
- [dbt Project Maturity](https://docs.getdbt.com/guides/best-practices/how-we-structure/1-guide-overview)
- [dbt MCP Server](https://github.com/dbt-labs/dbt-mcp)

## Version History

- **1.0.0** - Initial release with core analysis capabilities

Related Skills

project-install

509

from a5c-ai/babysitter

Install the Babysitter Codex workspace integration into the current project.

terraform-analyzer

509

from a5c-ai/babysitter

Specialized skill for analyzing Terraform configurations. Supports parsing, security scanning (tfsec, checkov), cost estimation (infracost), drift detection, and plan visualization across AWS, Azure, and GCP.

db-query-analyzer

509

from a5c-ai/babysitter

Analyze database query performance with execution plans and index recommendations

code-complexity-analyzer

509

from a5c-ai/babysitter

Analyze code complexity metrics including cyclomatic complexity, code smells, and technical debt

cloudformation-analyzer

509

from a5c-ai/babysitter

Validate and analyze AWS CloudFormation templates for security and best practices

semantic-code-analyzer

509

from a5c-ai/babysitter

LLM-powered semantic analysis of code diffs to detect business-logic trojans

sast-analyzer

509

from a5c-ai/babysitter

Static Application Security Testing orchestration and analysis. Execute Semgrep, Bandit, ESLint security plugins, CodeQL, and other SAST tools. Parse, prioritize, and deduplicate findings across multiple tools with remediation guidance.

crypto-analyzer

509

from a5c-ai/babysitter

Cryptographic implementation analysis and validation for encryption algorithms, key sizes, and certificate management

semver-analyzer

509

from a5c-ai/babysitter

Analyze code changes and determine semantic version bumps. Detect breaking changes automatically, suggest version bump (major/minor/patch), generate changelog entries, and validate version consistency.

api-diff-analyzer

509

from a5c-ai/babysitter

Compare API specifications to detect breaking changes. Compare OpenAPI spec versions, categorize changes by severity, generate migration guides, and block breaking changes in CI.

process-analyzer

509

from a5c-ai/babysitter

Analyze processes, identify workflows, define boundaries and scope, and map process requirements for specialization creation.

scope-logic-analyzer

509

from a5c-ai/babysitter

Test equipment integration for signal analysis (oscilloscope and logic analyzer)