answering-natural-language-questions-with-dbt
Writes and executes SQL queries against the data warehouse using dbt's Semantic Layer or ad-hoc SQL to answer business questions. Use when a user asks about analytics, metrics, KPIs, or data (e.g., "What were total sales last quarter?", "Show me top customers by revenue"). NOT for validating, testing, or building dbt models during development.
Best use case
answering-natural-language-questions-with-dbt is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Writes and executes SQL queries against the data warehouse using dbt's Semantic Layer or ad-hoc SQL to answer business questions. Use when a user asks about analytics, metrics, KPIs, or data (e.g., "What were total sales last quarter?", "Show me top customers by revenue"). NOT for validating, testing, or building dbt models during development.
Teams using answering-natural-language-questions-with-dbt should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/answering-natural-language-questions-with-dbt/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How answering-natural-language-questions-with-dbt Compares
| Feature / Agent | answering-natural-language-questions-with-dbt | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Writes and executes SQL queries against the data warehouse using dbt's Semantic Layer or ad-hoc SQL to answer business questions. Use when a user asks about analytics, metrics, KPIs, or data (e.g., "What were total sales last quarter?", "Show me top customers by revenue"). NOT for validating, testing, or building dbt models during development.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Answering Natural Language Questions with dbt
## Overview
Answer data questions using the best available method: semantic layer first, then SQL modification, then model discovery, then manifest analysis. Always exhaust options before saying "cannot answer."
**Use for:** Business questions from users that need data answers
- "What were total sales last month?"
- "How many active customers do we have?"
- "Show me revenue by region"
**Not for:**
- Validating model logic during development
- Testing dbt models or semantic layer definitions
- Building or modifying dbt models
- `dbt run`, `dbt test`, or `dbt build` workflows
## Decision Flow
```mermaid
flowchart TD
start([Business question received])
check_sl{Semantic layer tools available?}
list_metrics[list_metrics]
metric_exists{Relevant metric exists?}
get_dims[get_dimensions]
sl_sufficient{SL can answer directly?}
query_metrics[query_metrics]
answer([Return answer])
try_compiled[get_metrics_compiled_sql<br/>Modify SQL, execute_sql]
check_discovery{Model discovery tools available?}
try_discovery[get_mart_models<br/>get_model_details<br/>Write SQL, execute]
check_manifest{In dbt project?}
try_manifest[Analyze manifest/catalog<br/>Write SQL]
cannot([Cannot answer])
suggest{In dbt project?}
improvements[Suggest semantic layer changes]
done([Done])
start --> check_sl
check_sl -->|yes| list_metrics
check_sl -->|no| check_discovery
list_metrics --> metric_exists
metric_exists -->|yes| get_dims
metric_exists -->|no| check_discovery
get_dims --> sl_sufficient
sl_sufficient -->|yes| query_metrics
sl_sufficient -->|no| try_compiled
query_metrics --> answer
try_compiled -->|success| answer
try_compiled -->|fail| check_discovery
check_discovery -->|yes| try_discovery
check_discovery -->|no| check_manifest
try_discovery -->|success| answer
try_discovery -->|fail| check_manifest
check_manifest -->|yes| try_manifest
check_manifest -->|no| cannot
try_manifest -->|SQL ready| answer
answer --> suggest
cannot --> done
suggest -->|yes| improvements
suggest -->|no| done
improvements --> done
```
## Quick Reference
| Priority | Condition | Approach | Tools |
|----------|-----------|----------|-------|
| 1 | Semantic layer active | Query metrics directly | `list_metrics`, `get_dimensions`, `query_metrics` |
| 2 | SL active but minor modifications needed (missing dimension, custom filter, case when, different aggregation) | Modify compiled SQL | `get_metrics_compiled_sql`, then `execute_sql` |
| 3 | No SL, discovery tools active | Explore models, write SQL | `get_mart_models`, `get_model_details`, then `show`/`execute_sql` |
| 4 | No MCP, in dbt project | Analyze artifacts, write SQL | Read `target/manifest.json`, `target/catalog.json` |
## Approach 1: Semantic Layer Query
When `list_metrics` and `query_metrics` are available:
1. `list_metrics` - find relevant metric
2. `get_dimensions` - verify required dimensions exist
3. `query_metrics` - execute with appropriate filters
If semantic layer can't answer directly (missing dimension, need custom logic) → go to Approach 2.
## Approach 2: Modified Compiled SQL
When semantic layer has the metric but needs minor modifications:
- Missing dimension (join + group by)
- Custom filter not available as a dimension
- Case when logic for custom categorization
- Different aggregation than what's defined
1. `get_metrics_compiled_sql` - get the SQL that would run (returns raw SQL, not Jinja)
2. Modify SQL to add what's needed
3. `execute_sql` to run the raw SQL
4. **Always suggest** updating the semantic model if the modification would be reusable
```sql
-- Example: Adding sales_rep dimension
WITH base AS (
-- ... compiled metric logic (already resolved to table names) ...
)
SELECT base.*, reps.sales_rep_name
FROM base
JOIN analytics.dim_sales_reps reps ON base.rep_id = reps.id
GROUP BY ...
-- Example: Custom filter
SELECT * FROM (compiled_metric_sql) WHERE region = 'EMEA'
-- Example: Case when categorization
SELECT
CASE WHEN amount > 1000 THEN 'large' ELSE 'small' END as deal_size,
SUM(amount)
FROM (compiled_metric_sql)
GROUP BY 1
```
**Note:** The compiled SQL contains resolved table names, not `{{ ref() }}`. Work with the raw SQL as returned.
## Approach 3: Model Discovery
When no semantic layer but `get_all_models`/`get_model_details` available:
1. `get_mart_models` - start with marts, not staging
2. `get_model_details` for relevant models - understand schema
3. Write SQL using `{{ ref('model_name') }}`
4. `show --inline "..."` or `execute_sql`
**Prefer marts over staging** - marts have business logic applied.
## Approach 4: Manifest/Catalog Analysis
When in a dbt project but no MCP server:
1. Check for `target/manifest.json` and `target/catalog.json`
2. **Filter before reading** - these files can be large
```bash
# Find mart models in manifest
jq '.nodes | to_entries | map(select(.key | startswith("model.") and contains("mart"))) | .[].value | {name: .name, schema: .schema, columns: .columns}' target/manifest.json
# Get column info from catalog
jq '.nodes["model.project_name.model_name"].columns' target/catalog.json
```
3. Write SQL based on discovered schema
4. Explain: "This SQL should run in your warehouse. I cannot execute it without database access."
## Suggesting Improvements
**When in a dbt project**, suggest semantic layer changes after answering (or when cannot answer):
| Gap | Suggestion |
|-----|------------|
| Metric doesn't exist | "Add a metric definition to your semantic model" |
| Dimension missing | "Add `dimension_name` to the dimensions list in the semantic model" |
| No semantic layer | "Consider adding a semantic layer for this data" |
**Stay at semantic layer level.** Do NOT suggest:
- Database schema changes
- ETL pipeline modifications
- "Ask your data engineering team to..."
## Rationalizations to Resist
| You're Thinking... | Reality |
|--------------------|---------|
| "Semantic layer doesn't support this exact query" | Get compiled SQL and modify it (Approach 2) |
| "No MCP tools, can't help" | Check for manifest/catalog locally |
| "User needs this quickly, skip the systematic check" | Systematic approach IS the fastest path |
| "Just write SQL, it's faster" | Semantic layer exists for a reason - use it first |
| "The dimension doesn't exist in the data" | Maybe it exists but not in semantic layer config |
## Red Flags - STOP
- Writing SQL without checking if semantic layer can answer
- Saying "cannot answer" without trying all 4 approaches
- Suggesting database-level fixes for semantic layer gaps
- Reading entire manifest.json without filtering
- Using staging models when mart models exist
- Using this to validate model correctness rather than answer business questions
## Common Mistakes
| Mistake | Fix |
|---------|-----|
| Giving up when SL can't answer directly | Get compiled SQL and modify it |
| Querying staging models | Use `get_mart_models` first |
| Reading full manifest.json | Use jq to filter |
| Suggesting ETL changes | Keep suggestions at semantic layer |
| Not checking tool availability | List available tools before choosing approach |Related Skills
next-intl-add-language
Add new language to a Next.js + next-intl application
Zig — Modern Systems Programming Language
## Overview
Stagehand — AI Browser Automation in Natural Language
You are an expert in Stagehand by BrowserBase, the AI-powered browser automation framework that lets you control web pages using natural language instructions. You help developers build web automations that act, extract data, and observe pages using plain English commands instead of brittle CSS selectors — powered by GPT-4o or Claude for visual understanding of page layouts.
Mojo — Python-Speed Systems Language for AI
## Overview
Malloy — Semantic Data Language
## Overview
Gleam — Type-Safe Language for the BEAM
## Overview
Elixir — Functional Language for Scalable Applications
You are an expert in Elixir, the functional programming language built on the Erlang VM (BEAM). You help developers build highly concurrent, fault-tolerant, and distributed systems using Elixir's process model, pattern matching, GenServer, supervision trees, Phoenix web framework, LiveView for real-time UI, and Ecto for database interactions — achieving massive concurrency with lightweight processes and "let it crash" reliability.
SentencePiece - Language-Independent Tokenization
Unsupervised tokenizer that works on raw text without language-specific preprocessing.
LLaVA - Large Language and Vision Assistant
Open-source vision-language model for conversational image understanding.
CLIP - Contrastive Language-Image Pre-Training
OpenAI's model that understands images from natural language.
BLIP-2: Vision-Language Pre-training
Comprehensive guide to using Salesforce's BLIP-2 for vision-language tasks with frozen image encoders and large language models.
Azure AI Language Skill
This skill provides expert guidance for Azure AI Language. Covers troubleshooting, best practices, decision making, architecture & design patterns, limits & quotas, security, configuration, integrations & coding patterns, and deployment. It combines local quick-reference content with remote documentation fetching capabilities.