sql-query-optimizer

Analyzes and optimizes SQL queries across different data warehouse platforms (Snowflake, BigQuery, Redshift, Databricks) with platform-specific recommendations.

509 stars

Best use case

sql-query-optimizer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Analyzes and optimizes SQL queries across different data warehouse platforms (Snowflake, BigQuery, Redshift, Databricks) with platform-specific recommendations.

Teams using sql-query-optimizer should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/sql-query-optimizer/SKILL.md --create-dirs "https://raw.githubusercontent.com/a5c-ai/babysitter/main/library/specializations/data-engineering-analytics/skills/sql-query-optimizer/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/sql-query-optimizer/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How sql-query-optimizer Compares

Feature / Agentsql-query-optimizerStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Analyzes and optimizes SQL queries across different data warehouse platforms (Snowflake, BigQuery, Redshift, Databricks) with platform-specific recommendations.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# SQL Query Optimizer

Analyzes and optimizes SQL queries across different data warehouse platforms with platform-specific recommendations.

## Overview

This skill examines SQL queries to identify performance bottlenecks, suggest optimizations, and provide platform-specific recommendations for Snowflake, BigQuery, Redshift, and Databricks. It analyzes query execution plans, recommends indexes/clustering keys, and identifies anti-patterns.

## Capabilities

- **Query execution plan analysis** - Parse and analyze EXPLAIN outputs
- **Index recommendations** - Suggest clustering keys, sort keys, partition keys
- **Join optimization** - Identify inefficient join patterns and suggest improvements
- **Subquery elimination** - Convert correlated subqueries to CTEs or joins
- **CTE optimization** - Materialize vs reference optimization
- **Window function optimization** - Frame and partition optimization
- **Predicate pushdown validation** - Verify filter pushdown effectiveness
- **Clustering key recommendations** - Platform-specific clustering strategies
- **Materialized view suggestions** - Identify candidates for materialized views
- **Platform-specific optimizations** - Snowflake, BigQuery, Redshift, Databricks

## Input Schema

```json
{
  "query": {
    "type": "string",
    "description": "The SQL query to analyze",
    "required": true
  },
  "platform": {
    "type": "string",
    "enum": ["snowflake", "bigquery", "redshift", "databricks", "postgres"],
    "required": true,
    "description": "Target data warehouse platform"
  },
  "tableStatistics": {
    "type": "object",
    "description": "Table statistics including row counts, column cardinality",
    "properties": {
      "tables": {
        "type": "array",
        "items": {
          "name": "string",
          "rowCount": "number",
          "sizeGB": "number",
          "columns": "array"
        }
      }
    }
  },
  "executionPlan": {
    "type": "object",
    "description": "Query execution plan (EXPLAIN output)"
  },
  "queryHistory": {
    "type": "object",
    "description": "Historical query performance metrics"
  },
  "optimizationGoals": {
    "type": "array",
    "items": {
      "type": "string",
      "enum": ["latency", "cost", "throughput", "scan_reduction"]
    },
    "default": ["latency", "cost"]
  }
}
```

## Output Schema

```json
{
  "optimizedQuery": {
    "type": "string",
    "description": "The optimized SQL query"
  },
  "improvements": {
    "type": "array",
    "items": {
      "type": {
        "type": "string",
        "enum": ["join", "predicate", "aggregation", "cte", "window", "scan", "index"]
      },
      "description": "string",
      "impact": "high|medium|low",
      "lineNumber": "number",
      "originalCode": "string",
      "optimizedCode": "string"
    }
  },
  "indexRecommendations": {
    "type": "array",
    "items": {
      "table": "string",
      "type": "clustering|sort|partition|index",
      "columns": "array",
      "rationale": "string",
      "ddl": "string"
    }
  },
  "estimatedImprovement": {
    "scanReduction": {
      "type": "number",
      "description": "Percentage reduction in data scanned"
    },
    "timeReduction": {
      "type": "number",
      "description": "Percentage reduction in execution time"
    },
    "costReduction": {
      "type": "number",
      "description": "Percentage reduction in query cost"
    }
  },
  "antiPatterns": {
    "type": "array",
    "items": {
      "pattern": "string",
      "severity": "high|medium|low",
      "location": "string",
      "suggestion": "string"
    }
  },
  "platformSpecificNotes": {
    "type": "array",
    "items": "string"
  }
}
```

## Usage Examples

### Basic Query Optimization

```json
{
  "query": "SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE o.created_at > '2024-01-01'",
  "platform": "snowflake"
}
```

### With Execution Plan Analysis

```json
{
  "query": "SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id",
  "platform": "bigquery",
  "executionPlan": {
    "stages": [...],
    "totalBytesProcessed": 1073741824
  },
  "optimizationGoals": ["cost", "scan_reduction"]
}
```

### With Table Statistics

```json
{
  "query": "SELECT ... complex query ...",
  "platform": "redshift",
  "tableStatistics": {
    "tables": [
      {
        "name": "orders",
        "rowCount": 10000000,
        "sizeGB": 50,
        "columns": [
          {"name": "order_id", "cardinality": 10000000},
          {"name": "customer_id", "cardinality": 500000}
        ]
      }
    ]
  }
}
```

## Platform-Specific Optimizations

### Snowflake

| Optimization | Description |
|--------------|-------------|
| Clustering keys | Recommend micro-partition clustering |
| Result cache | Identify queries benefiting from caching |
| Query acceleration | Suggest QUERY_ACCELERATION_MAX_SCALE_FACTOR |
| Warehouse sizing | Right-size warehouse recommendations |

### BigQuery

| Optimization | Description |
|--------------|-------------|
| Partitioning | DATE/TIMESTAMP partitioning recommendations |
| Clustering | Up to 4 clustering columns |
| BI Engine | Identify BI Engine-eligible queries |
| Slots | Estimate slot usage optimization |

### Redshift

| Optimization | Description |
|--------------|-------------|
| Sort keys | COMPOUND vs INTERLEAVED recommendations |
| Distribution | KEY, EVEN, ALL distribution strategies |
| Compression | Column encoding recommendations |
| Vacuum | VACUUM and ANALYZE recommendations |

### Databricks

| Optimization | Description |
|--------------|-------------|
| Z-ordering | Multi-column Z-order recommendations |
| Delta cache | Caching strategy recommendations |
| Photon | Photon-eligible query patterns |
| Adaptive execution | AQE configuration suggestions |

## Common Anti-Patterns Detected

### Query Structure

| Anti-Pattern | Impact | Fix |
|--------------|--------|-----|
| SELECT * | High | Specify columns explicitly |
| Correlated subqueries | High | Convert to JOIN or CTE |
| DISTINCT on large datasets | Medium | Use GROUP BY or window functions |
| Non-SARGable predicates | High | Rewrite for index usage |

### Join Issues

| Anti-Pattern | Impact | Fix |
|--------------|--------|-----|
| Cartesian products | Critical | Add join conditions |
| Implicit joins | Medium | Use explicit JOIN syntax |
| Wrong join order | High | Reorder by selectivity |
| Missing indexes on join keys | High | Add clustering/sort keys |

### Aggregation Issues

| Anti-Pattern | Impact | Fix |
|--------------|--------|-----|
| GROUP BY ordinal | Low | Use column names |
| Aggregating before filter | High | Filter first, then aggregate |
| Over-grouping | Medium | Reduce GROUP BY columns |

## Integration Points

### MCP Server Integration

- **Snowflake MCP** - Real-time execution plan analysis
- **BigQuery MCP** - Cost estimation and slot analysis
- **Redshift MCP** - Query execution and statistics

### Related Skills

- Data Quality Profiler (SK-DEA-005) - Table statistics gathering
- dbt Project Analyzer (SK-DEA-003) - Model query optimization

### Applicable Processes

- Query Optimization (`query-optimization.js`)
- Data Warehouse Setup (`data-warehouse-setup.js`)
- BI Dashboard Development (`bi-dashboard.js`)
- OBT Creation (`obt-creation.js`)

## References

- [Snowflake Query Performance](https://docs.snowflake.com/en/user-guide/performance-query)
- [BigQuery Best Practices](https://cloud.google.com/bigquery/docs/best-practices-performance-overview)
- [Redshift Performance Tuning](https://docs.aws.amazon.com/redshift/latest/dg/c_designing-queries-best-practices.html)
- [Databricks SQL Best Practices](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select.html)

## Version History

- **1.0.0** - Initial release with multi-platform support

Related Skills

react-query

509
from a5c-ai/babysitter

TanStack Query (React Query) patterns for server state management, caching, mutations, optimistic updates, and infinite queries.

svg-optimizer

509
from a5c-ai/babysitter

Optimize SVG assets, generate sprites, and convert to React components

db-query-analyzer

509
from a5c-ai/babysitter

Analyze database query performance with execution plans and index recommendations

clinical-documentation-query

509
from a5c-ai/babysitter

Generate compliant physician queries to clarify clinical documentation for accurate coding, severity of illness, and risk of mortality capture

doe-optimizer

509
from a5c-ai/babysitter

Skill for optimizing experimental designs using DOE principles

circuit-optimizer

509
from a5c-ai/babysitter

Quantum circuit optimization skill for gate reduction, depth minimization, and hardware-aware compilation

nanoparticle-synthesis-optimizer

509
from a5c-ai/babysitter

Synthesis parameter optimization skill for metal, semiconductor, and oxide nanoparticle production with automated protocol generation and reproducibility validation

drug-encapsulation-optimizer

509
from a5c-ai/babysitter

Drug delivery formulation skill for optimizing drug loading, encapsulation efficiency, and release kinetics

warehouse-slotting-optimizer

509
from a5c-ai/babysitter

Warehouse slotting and layout optimization skill for pick path minimization and space utilization.

network-optimizer

509
from a5c-ai/babysitter

Network optimization skill for transportation, assignment, and flow problems on graph structures.

facility-layout-optimizer

509
from a5c-ai/babysitter

Facility layout optimization skill for material flow minimization and space utilization.

signal-timing-optimizer

509
from a5c-ai/babysitter

Traffic signal timing optimization skill for cycle length, phasing, and coordination