clickhouse-performance-tuning

Optimize ClickHouse query performance with indexing, projections, settings tuning, and query analysis using system tables. Use when queries are slow, investigating performance bottlenecks, or tuning ClickHouse server settings. Trigger: "clickhouse performance", "optimize clickhouse query", "clickhouse slow query", "clickhouse indexing", "clickhouse tuning", "clickhouse projections".

1,868 stars

Best use case

clickhouse-performance-tuning is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Optimize ClickHouse query performance with indexing, projections, settings tuning, and query analysis using system tables. Use when queries are slow, investigating performance bottlenecks, or tuning ClickHouse server settings. Trigger: "clickhouse performance", "optimize clickhouse query", "clickhouse slow query", "clickhouse indexing", "clickhouse tuning", "clickhouse projections".

Teams using clickhouse-performance-tuning should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/clickhouse-performance-tuning/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/saas-packs/clickhouse-pack/skills/clickhouse-performance-tuning/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/clickhouse-performance-tuning/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How clickhouse-performance-tuning Compares

Feature / Agentclickhouse-performance-tuningStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Optimize ClickHouse query performance with indexing, projections, settings tuning, and query analysis using system tables. Use when queries are slow, investigating performance bottlenecks, or tuning ClickHouse server settings. Trigger: "clickhouse performance", "optimize clickhouse query", "clickhouse slow query", "clickhouse indexing", "clickhouse tuning", "clickhouse projections".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# ClickHouse Performance Tuning

## Overview

Diagnose and fix ClickHouse performance issues using query analysis, proper indexing,
projections, materialized views, and server settings tuning.

## Prerequisites

- ClickHouse tables with data (see `clickhouse-core-workflow-a`)
- Access to `system.query_log` and `system.parts`

## Instructions

### Step 1: Diagnose Slow Queries

```sql
-- Find the slowest queries in the last 24 hours
SELECT
    event_time,
    query_duration_ms,
    read_rows,
    read_bytes,
    result_rows,
    memory_usage,
    substring(query, 1, 300) AS query_preview
FROM system.query_log
WHERE type = 'QueryFinish'
  AND event_time >= now() - INTERVAL 24 HOUR
  AND query_duration_ms > 1000   -- > 1 second
ORDER BY query_duration_ms DESC
LIMIT 20;

-- Analyze a specific query with EXPLAIN
EXPLAIN PLAN
SELECT event_type, count() FROM events WHERE created_at >= today() - 7 GROUP BY event_type;

-- Full pipeline analysis
EXPLAIN PIPELINE
SELECT event_type, count() FROM events WHERE created_at >= today() - 7 GROUP BY event_type;
```

### Step 2: ORDER BY Key Optimization

The ORDER BY key is ClickHouse's primary performance lever. Queries that filter
on the ORDER BY prefix skip entire granules (8192-row chunks).

```sql
-- Check what your current ORDER BY key is
SELECT
    database, table, sorting_key, primary_key,
    formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.tables
JOIN system.parts ON tables.name = parts.table AND tables.database = parts.database
WHERE tables.database = 'analytics' AND tables.name = 'events' AND parts.active
GROUP BY database, table, sorting_key, primary_key;

-- If your queries filter on (tenant_id, event_type, created_at)
-- but ORDER BY is (created_at), you're scanning too much data.
-- Fix: recreate table with correct ORDER BY
CREATE TABLE analytics.events_v2 AS analytics.events
ENGINE = MergeTree()
ORDER BY (tenant_id, event_type, toDate(created_at));

INSERT INTO analytics.events_v2 SELECT * FROM analytics.events;
RENAME TABLE analytics.events TO analytics.events_old,
             analytics.events_v2 TO analytics.events;
```

### Step 3: Data Skipping Indexes

```sql
-- Add a bloom filter index for high-cardinality lookups
ALTER TABLE analytics.events
    ADD INDEX idx_session_id session_id TYPE bloom_filter(0.01) GRANULARITY 4;

-- Add a set index for low-cardinality columns
ALTER TABLE analytics.events
    ADD INDEX idx_country country TYPE set(100) GRANULARITY 4;

-- Add a minmax index for range queries on non-ORDER-BY columns
ALTER TABLE analytics.events
    ADD INDEX idx_amount amount TYPE minmax GRANULARITY 4;

-- Materialize indexes for existing data
ALTER TABLE analytics.events MATERIALIZE INDEX idx_session_id;

-- Verify index usage
EXPLAIN indexes = 1
SELECT * FROM analytics.events WHERE session_id = 'abc-123';
```

### Step 4: Projections (Automatic Pre-Aggregation)

```sql
-- Add a projection for a common aggregation pattern
ALTER TABLE analytics.events
    ADD PROJECTION events_by_hour (
        SELECT
            toStartOfHour(created_at) AS hour,
            tenant_id,
            event_type,
            count() AS cnt,
            uniq(user_id) AS unique_users
        GROUP BY hour, tenant_id, event_type
    );

-- Materialize for existing data
ALTER TABLE analytics.events MATERIALIZE PROJECTION events_by_hour;

-- ClickHouse automatically uses the projection when the query matches
SELECT toStartOfHour(created_at) AS hour, count()
FROM analytics.events
WHERE tenant_id = 1
GROUP BY hour;
-- ^ This query reads from the projection (much smaller) instead of full table
```

### Step 5: Key Server Settings

```sql
-- Per-query performance settings
SET max_threads = 8;                       -- Threads per query (default: CPU cores)
SET max_memory_usage = 10000000000;        -- 10GB per query
SET max_bytes_before_external_sort = 10000000000;  -- Spill sorts to disk
SET max_bytes_before_external_group_by = 10000000000;  -- Spill GROUP BY to disk
SET optimize_read_in_order = 1;            -- Skip sorting if ORDER BY matches
SET compile_expressions = 1;               -- JIT compile expressions
SET max_execution_time = 60;               -- 60s timeout

-- Insert performance settings
SET async_insert = 1;                      -- Server-side batching for small inserts
SET async_insert_max_data_size = 10000000; -- 10MB flush threshold
SET async_insert_busy_timeout_ms = 5000;   -- 5s flush interval
SET min_insert_block_size_rows = 100000;   -- Min rows per insert block
```

### Step 6: Materialized Views for Dashboards

```sql
-- Pre-aggregate for dashboard queries (runs on INSERT, not on query)
CREATE TABLE analytics.dashboard_daily (
    date          Date,
    tenant_id     UInt32,
    total_events  UInt64,
    unique_users  AggregateFunction(uniq, UInt64),
    p95_latency   AggregateFunction(quantile(0.95), Float64)
)
ENGINE = AggregatingMergeTree()
ORDER BY (tenant_id, date);

CREATE MATERIALIZED VIEW analytics.dashboard_daily_mv
TO analytics.dashboard_daily
AS SELECT
    toDate(created_at) AS date,
    tenant_id,
    count() AS total_events,
    uniqState(user_id) AS unique_users,
    quantileState(0.95)(latency_ms) AS p95_latency
FROM analytics.events
GROUP BY date, tenant_id;

-- Query pre-aggregated data (milliseconds instead of seconds)
SELECT
    date,
    sum(total_events)                       AS events,
    uniqMerge(unique_users)                 AS users,
    quantileMerge(0.95)(p95_latency)        AS p95
FROM analytics.dashboard_daily
WHERE tenant_id = 1 AND date >= today() - 30
GROUP BY date ORDER BY date;
```

### Step 7: Query Optimization Patterns

```sql
-- Use PREWHERE for large tables (reads less data than WHERE)
SELECT * FROM analytics.events
PREWHERE event_type = 'purchase'   -- Evaluated first, skips non-matching granules
WHERE user_id > 1000;              -- Evaluated second, only on matching granules

-- Use LIMIT BY for top-N per group (more efficient than window functions)
SELECT tenant_id, event_type, count() AS cnt
FROM analytics.events
GROUP BY tenant_id, event_type
ORDER BY cnt DESC
LIMIT 5 BY tenant_id;   -- Top 5 event types per tenant

-- Use FINAL sparingly with ReplacingMergeTree
-- Instead of: SELECT * FROM users FINAL  (slow, full scan)
-- Prefer: SELECT argMax(email, updated_at) AS email FROM users GROUP BY user_id
```

## Performance Benchmarks

```sql
-- Measure bytes read and time for a specific query
SELECT
    query_duration_ms,
    read_rows,
    formatReadableSize(read_bytes) AS read_size,
    result_rows,
    formatReadableSize(memory_usage) AS memory
FROM system.query_log
WHERE query_id = currentQueryId()
  AND type = 'QueryFinish';
```

## Error Handling

| Issue | Indicator | Solution |
|-------|-----------|----------|
| Full table scan | `read_rows` = total rows | Fix ORDER BY to match filters |
| Memory exceeded | Error 241 | Add LIMIT, use streaming, increase limit |
| Slow GROUP BY | High `read_bytes` | Add materialized view or projection |
| Merge backlog | Parts > 300 | Reduce insert frequency, increase merge threads |

## Resources

- [Query Optimization Guide](https://clickhouse.com/docs/guides/developer/query-optimization)
- [Projections](https://clickhouse.com/docs/sql-reference/statements/alter/projection)
- [Data Skipping Indexes](https://clickhouse.com/docs/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-data_skipping-indexes)
- [MergeTree Settings](https://clickhouse.com/docs/operations/settings/merge-tree-settings)

## Next Steps

For cost optimization, see `clickhouse-cost-tuning`.

Related Skills

running-performance-tests

1868
from jeremylongshore/claude-code-plugins-plus-skills

Execute load testing, stress testing, and performance benchmarking. Use when performing specialized testing. Trigger with phrases like "run load tests", "test performance", or "benchmark the system".

workhuman-performance-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Workhuman performance tuning for employee recognition and rewards API. Use when integrating Workhuman Social Recognition, or building recognition workflows with HRIS systems. Trigger: "workhuman performance tuning".

workhuman-cost-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Workhuman cost tuning for employee recognition and rewards API. Use when integrating Workhuman Social Recognition, or building recognition workflows with HRIS systems. Trigger: "workhuman cost tuning".

wispr-performance-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Wispr Flow performance tuning for voice-to-text API integration. Use when integrating Wispr Flow dictation, WebSocket streaming, or building voice-powered applications. Trigger: "wispr performance tuning".

wispr-cost-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Wispr Flow cost tuning for voice-to-text API integration. Use when integrating Wispr Flow dictation, WebSocket streaming, or building voice-powered applications. Trigger: "wispr cost tuning".

windsurf-performance-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Optimize Windsurf IDE performance: indexing speed, Cascade responsiveness, and memory usage. Use when Windsurf is slow, indexing takes too long, Cascade times out, or the IDE uses too much memory. Trigger with phrases like "windsurf slow", "windsurf performance", "optimize windsurf", "windsurf memory", "cascade slow", "indexing slow".

windsurf-cost-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Optimize Windsurf licensing costs through seat management, tier selection, and credit monitoring. Use when analyzing Windsurf billing, reducing per-seat costs, or implementing usage monitoring and budget controls. Trigger with phrases like "windsurf cost", "windsurf billing", "reduce windsurf costs", "windsurf pricing", "windsurf budget".

webflow-performance-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Optimize Webflow API performance with response caching, bulk endpoint batching, CDN-cached live item reads, pagination optimization, and connection pooling. Use when experiencing slow API responses or optimizing request throughput. Trigger with phrases like "webflow performance", "optimize webflow", "webflow latency", "webflow caching", "webflow slow", "webflow batch".

webflow-cost-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Optimize Webflow costs through plan selection, CDN read optimization, bulk endpoint usage, and API usage monitoring with budget alerts. Use when analyzing Webflow billing, reducing API costs, or implementing usage monitoring for Webflow integrations. Trigger with phrases like "webflow cost", "webflow billing", "reduce webflow costs", "webflow pricing", "webflow budget".

vercel-performance-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Optimize Vercel deployment performance with caching, bundle optimization, and cold start reduction. Use when experiencing slow page loads, optimizing Core Web Vitals, or reducing serverless function cold start times. Trigger with phrases like "vercel performance", "optimize vercel", "vercel latency", "vercel caching", "vercel slow", "vercel cold start".

vercel-cost-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Optimize Vercel costs through plan selection, function efficiency, and usage monitoring. Use when analyzing Vercel billing, reducing function execution costs, or implementing spend management and budget alerts. Trigger with phrases like "vercel cost", "vercel billing", "reduce vercel costs", "vercel pricing", "vercel expensive", "vercel budget".

veeva-performance-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Veeva Vault performance tuning for REST API and clinical operations. Use when working with Veeva Vault document management and CRM. Trigger: "veeva performance tuning".