sql-databases

SQL query optimization, schema design, indexing strategies, and relational database mastery for production data systems

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

sql-databases is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

SQL query optimization, schema design, indexing strategies, and relational database mastery for production data systems

Teams using sql-databases should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/sql-databases/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/development/sql-databases/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/sql-databases/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How sql-databases Compares

Feature / Agent	sql-databases	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

SQL query optimization, schema design, indexing strategies, and relational database mastery for production data systems

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# SQL & Relational Databases

Production-grade SQL skills for designing, querying, and optimizing relational databases in data engineering workflows.

## Quick Start

```sql
-- Modern PostgreSQL 16+ query with window functions and CTEs
WITH daily_metrics AS (
    SELECT
        DATE_TRUNC('day', created_at) AS metric_date,
        category,
        COUNT(*) AS event_count,
        SUM(amount) AS total_amount,
        AVG(amount) AS avg_amount
    FROM events
    WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY 1, 2
),
ranked_categories AS (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY metric_date ORDER BY total_amount DESC) AS daily_rank,
        LAG(total_amount) OVER (PARTITION BY category ORDER BY metric_date) AS prev_day_amount
    FROM daily_metrics
)
SELECT
    metric_date,
    category,
    event_count,
    total_amount,
    ROUND(100.0 * (total_amount - prev_day_amount) / NULLIF(prev_day_amount, 0), 2) AS day_over_day_pct
FROM ranked_categories
WHERE daily_rank <= 5
ORDER BY metric_date DESC, daily_rank;
```

## Core Concepts

### 1. Query Optimization with EXPLAIN ANALYZE

```sql
-- Always analyze before optimizing
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.id, u.email, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at >= '2024-01-01'
GROUP BY u.id, u.email
HAVING COUNT(o.id) > 10;

-- Reading EXPLAIN output:
-- Seq Scan = Table scan (often slow, needs index)
-- Index Scan = Using index (good)
-- Bitmap Index Scan = Multiple index conditions (acceptable)
-- Hash Join = Building hash table (memory intensive)
-- Nested Loop = Row-by-row join (slow for large tables)
```

### 2. Indexing Strategies

```sql
-- B-tree index (default, most common)
CREATE INDEX idx_users_email ON users(email);

-- Partial index (smaller, faster for specific queries)
CREATE INDEX idx_active_users ON users(created_at)
WHERE status = 'active';

-- Composite index (column order matters!)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);

-- Covering index (includes all columns query needs)
CREATE INDEX idx_orders_covering ON orders(user_id)
INCLUDE (total_amount, status);

-- Expression index (for computed values)
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

-- GIN index for JSONB
CREATE INDEX idx_metadata_gin ON events USING GIN(metadata jsonb_path_ops);

-- Check index usage
SELECT
    schemaname, tablename, indexname,
    idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
```

### 3. Window Functions (2024-2025 Essential)

```sql
-- Running totals and moving averages
SELECT
    order_date,
    amount,
    -- Running total
    SUM(amount) OVER (ORDER BY order_date) AS running_total,
    -- 7-day moving average
    AVG(amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7d,
    -- Rank within partition
    RANK() OVER (PARTITION BY customer_id ORDER BY amount DESC) AS customer_rank,
    -- Percent of total within partition
    100.0 * amount / SUM(amount) OVER (PARTITION BY customer_id) AS pct_of_customer_total
FROM orders;

-- Gap and island detection
WITH numbered AS (
    SELECT
        event_date,
        ROW_NUMBER() OVER (ORDER BY event_date) AS rn,
        event_date - (ROW_NUMBER() OVER (ORDER BY event_date) * INTERVAL '1 day') AS grp
    FROM events
)
SELECT
    MIN(event_date) AS streak_start,
    MAX(event_date) AS streak_end,
    COUNT(*) AS streak_length
FROM numbered
GROUP BY grp
ORDER BY streak_start;
```

### 4. Transaction Isolation & ACID

```sql
-- Read Committed (default in PostgreSQL)
BEGIN;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- Sees committed data at query start
COMMIT;

-- Repeatable Read (for consistent snapshots)
BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- Same query returns same results within transaction
COMMIT;

-- Serializable (strongest, may cause retries)
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- Full isolation, but prepare for serialization failures
COMMIT;

-- Optimistic locking pattern
UPDATE products
SET stock = stock - 1, version = version + 1
WHERE id = $1 AND version = $2;
-- Check rows affected; if 0, concurrent modification occurred
```

### 5. Partitioning for Scale

```sql
-- Range partitioning (time-series data)
CREATE TABLE events (
    id BIGSERIAL,
    event_type TEXT NOT NULL,
    payload JSONB,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);

-- Create partitions
CREATE TABLE events_2024_q1 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE events_2024_q2 PARTITION OF events
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

-- Auto-partition with pg_partman (production setup)
SELECT partman.create_parent(
    p_parent_table := 'public.events',
    p_control := 'created_at',
    p_type := 'native',
    p_interval := 'monthly',
    p_premake := 3
);

-- List partitioning (categorical data)
CREATE TABLE orders (
    id BIGSERIAL,
    region TEXT NOT NULL,
    total NUMERIC(10,2)
) PARTITION BY LIST (region);

CREATE TABLE orders_us PARTITION OF orders FOR VALUES IN ('US', 'CA');
CREATE TABLE orders_eu PARTITION OF orders FOR VALUES IN ('UK', 'DE', 'FR');
```

## Tools & Technologies

| Tool | Purpose | Version (2025) |
|------|---------|----------------|
| **PostgreSQL** | Primary OLTP database | 16+ |
| **MySQL** | Alternative OLTP | 8.0+ |
| **DBeaver** | Universal SQL client | Latest |
| **pgAdmin** | PostgreSQL admin | 8+ |
| **pgcli** | CLI with autocomplete | Latest |
| **pg_stat_statements** | Query performance | Built-in |
| **pgBouncer** | Connection pooling | 1.21+ |
| **Flyway/Liquibase** | Schema migrations | Latest |

## Learning Path

### Phase 1: Foundations (Weeks 1-2)
```
Week 1: SELECT, WHERE, JOINs, GROUP BY, ORDER BY
Week 2: Aggregations, HAVING, subqueries, UNION
```

### Phase 2: Intermediate (Weeks 3-4)
```
Week 3: Window functions, CTEs, CASE expressions
Week 4: Index fundamentals, EXPLAIN basics
```

### Phase 3: Advanced (Weeks 5-7)
```
Week 5: Query optimization, execution plans
Week 6: Transactions, locking, isolation levels
Week 7: Partitioning, schema design patterns
```

### Phase 4: Production (Weeks 8-10)
```
Week 8: Connection pooling, high availability
Week 9: Migrations, versioning, rollback strategies
Week 10: Monitoring, alerting, performance tuning
```

## Production Patterns

### Schema Design: Star Schema

```sql
-- Fact table (measures, metrics)
CREATE TABLE fact_sales (
    sale_id BIGSERIAL PRIMARY KEY,
    date_key INT NOT NULL REFERENCES dim_date(date_key),
    product_key INT NOT NULL REFERENCES dim_product(product_key),
    customer_key INT NOT NULL REFERENCES dim_customer(customer_key),
    quantity INT NOT NULL,
    unit_price NUMERIC(10,2) NOT NULL,
    total_amount NUMERIC(12,2) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
) PARTITION BY RANGE (date_key);

-- Dimension table (descriptive attributes)
CREATE TABLE dim_product (
    product_key SERIAL PRIMARY KEY,
    product_id VARCHAR(50) NOT NULL,
    product_name VARCHAR(255) NOT NULL,
    category VARCHAR(100),
    subcategory VARCHAR(100),
    -- SCD Type 2 fields
    valid_from DATE NOT NULL DEFAULT CURRENT_DATE,
    valid_to DATE DEFAULT '9999-12-31',
    is_current BOOLEAN DEFAULT TRUE,
    UNIQUE(product_id, valid_from)
);
```

### Slow Query Detection

```sql
-- Enable pg_stat_statements
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 slowest queries
SELECT
    ROUND((total_exec_time / 1000)::numeric, 2) AS total_secs,
    calls,
    ROUND((mean_exec_time / 1000)::numeric, 4) AS mean_secs,
    ROUND((stddev_exec_time / 1000)::numeric, 4) AS stddev_secs,
    rows,
    query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Reset stats after optimization
SELECT pg_stat_statements_reset();
```

### Connection Pooling Configuration

```ini
# pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt

# Pool modes:
# session = 1 connection per client session
# transaction = 1 connection per transaction (recommended)
# statement = 1 connection per statement (limited use)
pool_mode = transaction

default_pool_size = 20
max_client_conn = 1000
reserve_pool_size = 5
```

## Troubleshooting Guide

### Common Failure Modes

| Issue | Symptoms | Root Cause | Fix |
|-------|----------|------------|-----|
| **Slow Query** | Query > 1s | Missing index, bad plan | EXPLAIN ANALYZE, add index |
| **Lock Wait** | Queries hang | Long transaction, deadlock | Check `pg_locks`, kill blocking |
| **Connection Exhausted** | "Too many connections" | No pooling, connection leak | Use pgbouncer, fix app |
| **Disk Full** | Write errors | Table bloat, WAL growth | VACUUM FULL, archive WAL |
| **Replication Lag** | Stale reads | Slow replica, network | Check `pg_stat_replication` |

### Debug Checklist

```sql
-- 1. Check active queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

-- 2. Find blocking queries
SELECT
    blocked.pid AS blocked_pid,
    blocked.query AS blocked_query,
    blocking.pid AS blocking_pid,
    blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks blocked_locks ON blocked.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocked_locks.locktype = blocking_locks.locktype
    AND blocked_locks.relation = blocking_locks.relation
    AND blocked_locks.pid != blocking_locks.pid
JOIN pg_stat_activity blocking ON blocking_locks.pid = blocking.pid
WHERE NOT blocked_locks.granted;

-- 3. Check table bloat
SELECT
    schemaname, tablename,
    pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
    n_dead_tup AS dead_rows,
    n_live_tup AS live_rows,
    ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

-- 4. Check index usage
SELECT
    schemaname, tablename, indexname,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC  -- Least used first
LIMIT 20;
```

### Kill Long-Running Query

```sql
-- Graceful cancel (SIGINT)
SELECT pg_cancel_backend(pid);

-- Force terminate (SIGTERM)
SELECT pg_terminate_backend(pid);

-- Kill all queries older than 5 minutes
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'active'
  AND query_start < NOW() - INTERVAL '5 minutes'
  AND pid != pg_backend_pid();
```

## Unit Test Template

```python
import pytest
from sqlalchemy import create_engine, text
from testcontainers.postgres import PostgresContainer

@pytest.fixture(scope="module")
def postgres_container():
    """Spin up real PostgreSQL for integration tests."""
    with PostgresContainer("postgres:16-alpine") as postgres:
        yield postgres

@pytest.fixture
def db_connection(postgres_container):
    """Create connection with transaction rollback."""
    engine = create_engine(postgres_container.get_connection_url())
    with engine.connect() as conn:
        trans = conn.begin()
        yield conn
        trans.rollback()

class TestQueryOptimization:

    def test_index_is_used(self, db_connection):
        # Arrange
        db_connection.execute(text("""
            CREATE TABLE test_users (
                id SERIAL PRIMARY KEY,
                email VARCHAR(255) NOT NULL
            );
            CREATE INDEX idx_email ON test_users(email);
            INSERT INTO test_users (email)
            SELECT 'user' || i || '@test.com' FROM generate_series(1, 10000) i;
            ANALYZE test_users;
        """))

        # Act
        result = db_connection.execute(text("""
            EXPLAIN (FORMAT JSON)
            SELECT * FROM test_users WHERE email = 'user5000@test.com'
        """)).fetchone()

        plan = result[0][0]

        # Assert
        assert "Index Scan" in str(plan) or "Index Only Scan" in str(plan)

    def test_window_function_correctness(self, db_connection):
        # Arrange
        db_connection.execute(text("""
            CREATE TABLE test_sales (id SERIAL, amount NUMERIC);
            INSERT INTO test_sales (amount) VALUES (100), (200), (300);
        """))

        # Act
        result = db_connection.execute(text("""
            SELECT amount, SUM(amount) OVER (ORDER BY id) AS running_total
            FROM test_sales ORDER BY id
        """)).fetchall()

        # Assert
        assert result[0][1] == 100  # First running total
        assert result[1][1] == 300  # 100 + 200
        assert result[2][1] == 600  # 100 + 200 + 300
```

## Best Practices

### Query Writing
```sql
-- ✅ DO: Use explicit JOINs
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON u.id = o.user_id;

-- ❌ DON'T: Use implicit joins
SELECT u.name, o.total
FROM users u, orders o
WHERE u.id = o.user_id;

-- ✅ DO: Use EXISTS for existence checks
SELECT * FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);

-- ❌ DON'T: Use IN with subquery for large datasets
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders);

-- ✅ DO: Paginate with keyset (cursor)
SELECT * FROM events
WHERE id > $last_seen_id
ORDER BY id
LIMIT 100;

-- ❌ DON'T: Use OFFSET for deep pagination
SELECT * FROM events
ORDER BY id
LIMIT 100 OFFSET 1000000;  -- Slow!
```

### Schema Design
```sql
-- ✅ DO: Use appropriate data types
CREATE TABLE metrics (
    id BIGSERIAL PRIMARY KEY,  -- Use BIGSERIAL for high-volume
    value NUMERIC(10,2),       -- Precise decimal
    created_at TIMESTAMPTZ     -- Always use timezone-aware
);

-- ✅ DO: Add constraints
ALTER TABLE orders
ADD CONSTRAINT chk_positive_amount CHECK (amount > 0);

-- ✅ DO: Use JSONB for flexible schema
CREATE TABLE events (
    id BIGSERIAL PRIMARY KEY,
    event_type TEXT NOT NULL,
    payload JSONB NOT NULL DEFAULT '{}'
);
```

## Resources

### Official Documentation
- [PostgreSQL Docs](https://www.postgresql.org/docs/)
- [MySQL Reference Manual](https://dev.mysql.com/doc/refman/8.0/en/)
- [SQL Standard](https://modern-sql.com/)

### Performance Tuning
- [Use The Index, Luke](https://use-the-index-luke.com/)
- [PostgreSQL EXPLAIN Visualizer](https://explain.dalibo.com/)
- [pgMustard](https://www.pgmustard.com/)

### Books
- "SQL Performance Explained" by Markus Winand
- "High Performance MySQL" by Schwartz et al.
- "The Art of PostgreSQL" by Dimitri Fontaine

## Next Skills

After mastering SQL databases:
- → `data-warehousing` - Snowflake, BigQuery, dimensional modeling
- → `etl-tools` - Build pipelines with Airflow
- → `nosql-databases` - MongoDB, Redis, DynamoDB
- → `big-data` - Spark SQL at scale

---

**Skill Certification Checklist:**
- [ ] Can write complex queries with CTEs and window functions
- [ ] Can analyze and optimize query performance with EXPLAIN
- [ ] Can design normalized schemas (3NF, BCNF)
- [ ] Can implement partitioning strategies
- [ ] Can configure connection pooling and monitoring

Related Skills

nosql-databases

from diegosouzapw/awesome-omni-skill

Apply NoSQL best practices for MongoDB, Convex, and document databases. Use when designing schemas, writing queries, optimizing performance, or building applications with non-relational databases. Use with database-expert for query optimization and DBA-level tuning (20+ years experience).

designing-databases

from diegosouzapw/awesome-omni-skill

データベーススキーマ設計と最適化を支援します。正規化戦略、インデックス設計、パフォーマンス最適化を提供します。データモデル設計、データベース構造の最適化が必要な場合に使用してください。

databases

from diegosouzapw/awesome-omni-skill

Work with MongoDB (document database, BSON documents, aggregation pipelines, Atlas cloud) and PostgreSQL (relational database, SQL queries, psql CLI, pgAdmin). Use when designing database schemas, writing queries and aggregations, optimizing indexes for performance, performing database migrations, configuring replication and sharding, implementing backup and restore strategies, managing database users and permissions, analyzing query performance, or administering production databases.

databases-architecture-skill

from diegosouzapw/awesome-omni-skill

Master database design (SQL, NoSQL), system architecture, API design (REST, GraphQL), and building scalable systems. Learn PostgreSQL, MongoDB, system design patterns, and enterprise architectures.

bio-clinical-databases-gnomad-frequencies

from diegosouzapw/awesome-omni-skill

Query gnomAD for population allele frequencies to assess variant rarity. Use when filtering variants by population frequency for rare disease analysis or determining if a variant is common in the general population.

acsets-algebraic-databases

from diegosouzapw/awesome-omni-skill

ACSets (Attributed C-Sets): Algebraic databases as in-memory data structures. Category-theoretic formalism for relational databases generalizing graphs and data frames.

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

swift-human-guidelines

from diegosouzapw/awesome-omni-skill

Comprehensive Swift 6 and SwiftUI development guidelines for building iOS 26, iOS 18, iPadOS, macOS, watchOS, visionOS, and tvOS applications. Covers Foundation Models API, BGContinuedProcessingTask, Call Translation API, Liquid Glass design system, data-race safety, typed throws, synchronization primitives, SwiftUI/UIKit interoperability, zoom transitions, and document-based apps. Use when building new Apple platform apps, implementing Apple Intelligence features, optimizing performance with Swift 6 concurrency, following Apple Human Interface Guidelines, creating cross-platform applications, or working with iOS 26/18 APIs. Triggers on Swift code, SwiftUI views, Xcode projects, app architecture, background processing, translation features, Foundation Models, synchronization, actors, Sendable types, or modern Apple platform development.

swift-conventions

from diegosouzapw/awesome-omni-skill

Swift coding conventions and best practices for modern Swift development. Use when writing, reviewing, or refactoring Swift code to ensure consistency with naming conventions, access control, async/await patterns, and SwiftUI/framework best practices.

swift-concurrency

from diegosouzapw/awesome-omni-skill

Expert guidance on Swift Concurrency best practices, patterns, and implementation. Use when developers mention: (1) Swift Concurrency, async/await, actors, or tasks, (2) "use Swift Concurrency" or "modern concurrency patterns", (3) migrating to Swift 6, (4) data races or thread safety issues, (5) refactoring closures to async/await, (6) @MainActor, Sendable, or actor isolation, (7) concurrent code architecture or performance optimization, (8) concurrency-related linter warnings (SwiftLint or similar; e.g. async_without_await, Sendable/actor isolation/MainActor lint).

swedish-medications

from diegosouzapw/awesome-omni-skill

Look up Swedish medication information from FASS (Farmaceutiska Specialiteter i Sverige). Use when users ask about medications, drugs, läkemedel, dosages, side effects (biverkningar), interactions, or need to understand prescriptions in Sweden. Covers all medications approved for use in Sweden.

swe-programming-elixir-phoenix

from diegosouzapw/awesome-omni-skill

Phoenix Framework coding standards from authoritative docs/explanation/software-engineering/platform-web/tools/elixir-phoenix/ documentation