clickhouse-io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

144,923 stars
Complexity: easy

About this skill

This skill provides a comprehensive guide to ClickHouse database patterns, query optimization techniques, analytics strategies, and data engineering best practices. It is specifically tailored for building and managing high-performance analytical workloads with ClickHouse, a column-oriented DBMS. The skill equips AI agents, particularly Claude, with the foundational knowledge to design efficient ClickHouse schemas, write optimized queries, and apply robust data engineering principles to leverage ClickHouse's capabilities for large-scale, real-time data analysis. It covers core concepts like MergeTree engines, data compression, parallel and distributed query execution, and more, as demonstrated through practical SQL examples for table design.

Best use case

An AI agent can leverage this skill to assist users with designing optimal ClickHouse database schemas, troubleshooting performance issues in existing ClickHouse queries, generating efficient analytical queries, or providing recommendations for data engineering pipelines utilizing ClickHouse for high-performance analytics.

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

The AI agent will provide well-informed, ClickHouse-specific recommendations, generate optimized SQL queries, suggest efficient table designs (e.g., using MergeTree engines with appropriate partitioning and ordering), and offer best practices for managing and analyzing large datasets in a high-performance analytical environment.

Practical example

Example input

Design a ClickHouse table for real-time market analytics, including date, market ID, volume, and number of unique traders. Ensure it's optimized for time-series queries and high data ingestion rates.

Example output

```sql
CREATE TABLE markets_analytics (
    date Date,
    market_id String,
    market_name String,
    volume UInt64,
    trades UInt32,
    unique_traders UInt32,
    avg_trade_size Float64,
    created_at DateTime
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (date, market_id)
SETTINGS index_granularity = 8192;
```

**Explanation:**
- **ENGINE = MergeTree()**: This is the most common and recommended engine for OLAP tasks in ClickHouse, offering high performance for insertions and queries, data compression, and replication capabilities.
- **PARTITION BY toYYYYMM(date)**: Partitioning by year-month helps prune data quickly for time-series queries, reducing the amount of data ClickHouse needs to scan.
- **ORDER BY (date, market_id)**: Sorting by `date` first ensures efficient retrieval of time-series data. Including `market_id` creates a compound primary key that helps group data for specific markets within a date, further optimizing queries that filter by both. `index_granularity` of 8192 is a common default that balances index size and query performance.

When to use this skill

  • Use this skill when you need an AI agent to: - Design a new ClickHouse table structure for analytical data. - Optimize an existing ClickHouse query for faster execution. - Understand ClickHouse-specific best practices for large datasets. - Get recommendations on data partitioning, indexing, or engine selection within ClickHouse. - Analyze and improve data engineering workflows involving ClickHouse for OLAP.

When not to use this skill

  • Do not use this skill when the task involves: - Databases other than ClickHouse. - Transactional (OLTP) workloads, as ClickHouse is optimized for OLAP. - Basic data entry or CRUD operations where high-performance analytical patterns are not relevant. - General software development tasks unrelated to database design or data engineering.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/clickhouse-io/SKILL.md --create-dirs "https://raw.githubusercontent.com/affaan-m/everything-claude-code/main/docs/ja-JP/skills/clickhouse-io/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/clickhouse-io/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How clickhouse-io Compares

Feature / Agentclickhouse-ioStandard Approach
Platform SupportClaudeLimited / Varies
Context Awareness High Baseline
Installation ComplexityeasyN/A

Frequently Asked Questions

What does this skill do?

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# ClickHouse 分析パターン

高性能分析とデータエンジニアリングのためのClickHouse固有のパターン。

## 概要

ClickHouseは、オンライン分析処理(OLAP)用のカラム指向データベース管理システム(DBMS)です。大規模データセットに対する高速分析クエリに最適化されています。

**主な機能:**
- カラム指向ストレージ
- データ圧縮
- 並列クエリ実行
- 分散クエリ
- リアルタイム分析

## テーブル設計パターン

### MergeTreeエンジン(最も一般的)

```sql
CREATE TABLE markets_analytics (
    date Date,
    market_id String,
    market_name String,
    volume UInt64,
    trades UInt32,
    unique_traders UInt32,
    avg_trade_size Float64,
    created_at DateTime
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (date, market_id)
SETTINGS index_granularity = 8192;
```

### ReplacingMergeTree(重複排除)

```sql
-- 重複がある可能性のあるデータ(複数のソースからなど)用
CREATE TABLE user_events (
    event_id String,
    user_id String,
    event_type String,
    timestamp DateTime,
    properties String
) ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (user_id, event_id, timestamp)
PRIMARY KEY (user_id, event_id);
```

### AggregatingMergeTree(事前集計)

```sql
-- 集計メトリクスの維持用
CREATE TABLE market_stats_hourly (
    hour DateTime,
    market_id String,
    total_volume AggregateFunction(sum, UInt64),
    total_trades AggregateFunction(count, UInt32),
    unique_users AggregateFunction(uniq, String)
) ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, market_id);

-- 集計データのクエリ
SELECT
    hour,
    market_id,
    sumMerge(total_volume) AS volume,
    countMerge(total_trades) AS trades,
    uniqMerge(unique_users) AS users
FROM market_stats_hourly
WHERE hour >= toStartOfHour(now() - INTERVAL 24 HOUR)
GROUP BY hour, market_id
ORDER BY hour DESC;
```

## クエリ最適化パターン

### 効率的なフィルタリング

```sql
-- PASS: 良い: インデックス列を最初に使用
SELECT *
FROM markets_analytics
WHERE date >= '2025-01-01'
  AND market_id = 'market-123'
  AND volume > 1000
ORDER BY date DESC
LIMIT 100;

-- FAIL: 悪い: インデックスのない列を最初にフィルタリング
SELECT *
FROM markets_analytics
WHERE volume > 1000
  AND market_name LIKE '%election%'
  AND date >= '2025-01-01';
```

### 集計

```sql
-- PASS: 良い: ClickHouse固有の集計関数を使用
SELECT
    toStartOfDay(created_at) AS day,
    market_id,
    sum(volume) AS total_volume,
    count() AS total_trades,
    uniq(trader_id) AS unique_traders,
    avg(trade_size) AS avg_size
FROM trades
WHERE created_at >= today() - INTERVAL 7 DAY
GROUP BY day, market_id
ORDER BY day DESC, total_volume DESC;

-- PASS: パーセンタイルにはquantileを使用(percentileより効率的)
SELECT
    quantile(0.50)(trade_size) AS median,
    quantile(0.95)(trade_size) AS p95,
    quantile(0.99)(trade_size) AS p99
FROM trades
WHERE created_at >= now() - INTERVAL 1 HOUR;
```

### ウィンドウ関数

```sql
-- 累計計算
SELECT
    date,
    market_id,
    volume,
    sum(volume) OVER (
        PARTITION BY market_id
        ORDER BY date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_volume
FROM markets_analytics
WHERE date >= today() - INTERVAL 30 DAY
ORDER BY market_id, date;
```

## データ挿入パターン

### 一括挿入(推奨)

```typescript
import { ClickHouse } from 'clickhouse'

const clickhouse = new ClickHouse({
  url: process.env.CLICKHOUSE_URL,
  port: 8123,
  basicAuth: {
    username: process.env.CLICKHOUSE_USER,
    password: process.env.CLICKHOUSE_PASSWORD
  }
})

// PASS: バッチ挿入(効率的)
async function bulkInsertTrades(trades: Trade[]) {
  const values = trades.map(trade => `(
    '${trade.id}',
    '${trade.market_id}',
    '${trade.user_id}',
    ${trade.amount},
    '${trade.timestamp.toISOString()}'
  )`).join(',')

  await clickhouse.query(`
    INSERT INTO trades (id, market_id, user_id, amount, timestamp)
    VALUES ${values}
  `).toPromise()
}

// FAIL: 個別挿入(低速)
async function insertTrade(trade: Trade) {
  // ループ内でこれをしないでください!
  await clickhouse.query(`
    INSERT INTO trades VALUES ('${trade.id}', ...)
  `).toPromise()
}
```

### ストリーミング挿入

```typescript
// 継続的なデータ取り込み用
import { createWriteStream } from 'fs'
import { pipeline } from 'stream/promises'

async function streamInserts() {
  const stream = clickhouse.insert('trades').stream()

  for await (const batch of dataSource) {
    stream.write(batch)
  }

  await stream.end()
}
```

## マテリアライズドビュー

### リアルタイム集計

```sql
-- 時間別統計のマテリアライズドビューを作成
CREATE MATERIALIZED VIEW market_stats_hourly_mv
TO market_stats_hourly
AS SELECT
    toStartOfHour(timestamp) AS hour,
    market_id,
    sumState(amount) AS total_volume,
    countState() AS total_trades,
    uniqState(user_id) AS unique_users
FROM trades
GROUP BY hour, market_id;

-- マテリアライズドビューのクエリ
SELECT
    hour,
    market_id,
    sumMerge(total_volume) AS volume,
    countMerge(total_trades) AS trades,
    uniqMerge(unique_users) AS users
FROM market_stats_hourly
WHERE hour >= now() - INTERVAL 24 HOUR
GROUP BY hour, market_id;
```

## パフォーマンスモニタリング

### クエリパフォーマンス

```sql
-- 低速クエリをチェック
SELECT
    query_id,
    user,
    query,
    query_duration_ms,
    read_rows,
    read_bytes,
    memory_usage
FROM system.query_log
WHERE type = 'QueryFinish'
  AND query_duration_ms > 1000
  AND event_time >= now() - INTERVAL 1 HOUR
ORDER BY query_duration_ms DESC
LIMIT 10;
```

### テーブル統計

```sql
-- テーブルサイズをチェック
SELECT
    database,
    table,
    formatReadableSize(sum(bytes)) AS size,
    sum(rows) AS rows,
    max(modification_time) AS latest_modification
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytes) DESC;
```

## 一般的な分析クエリ

### 時系列分析

```sql
-- 日次アクティブユーザー
SELECT
    toDate(timestamp) AS date,
    uniq(user_id) AS daily_active_users
FROM events
WHERE timestamp >= today() - INTERVAL 30 DAY
GROUP BY date
ORDER BY date;

-- リテンション分析
SELECT
    signup_date,
    countIf(days_since_signup = 0) AS day_0,
    countIf(days_since_signup = 1) AS day_1,
    countIf(days_since_signup = 7) AS day_7,
    countIf(days_since_signup = 30) AS day_30
FROM (
    SELECT
        user_id,
        min(toDate(timestamp)) AS signup_date,
        toDate(timestamp) AS activity_date,
        dateDiff('day', signup_date, activity_date) AS days_since_signup
    FROM events
    GROUP BY user_id, activity_date
)
GROUP BY signup_date
ORDER BY signup_date DESC;
```

### ファネル分析

```sql
-- コンバージョンファネル
SELECT
    countIf(step = 'viewed_market') AS viewed,
    countIf(step = 'clicked_trade') AS clicked,
    countIf(step = 'completed_trade') AS completed,
    round(clicked / viewed * 100, 2) AS view_to_click_rate,
    round(completed / clicked * 100, 2) AS click_to_completion_rate
FROM (
    SELECT
        user_id,
        session_id,
        event_type AS step
    FROM events
    WHERE event_date = today()
)
GROUP BY session_id;
```

### コホート分析

```sql
-- サインアップ月別のユーザーコホート
SELECT
    toStartOfMonth(signup_date) AS cohort,
    toStartOfMonth(activity_date) AS month,
    dateDiff('month', cohort, month) AS months_since_signup,
    count(DISTINCT user_id) AS active_users
FROM (
    SELECT
        user_id,
        min(toDate(timestamp)) OVER (PARTITION BY user_id) AS signup_date,
        toDate(timestamp) AS activity_date
    FROM events
)
GROUP BY cohort, month, months_since_signup
ORDER BY cohort, months_since_signup;
```

## データパイプラインパターン

### ETLパターン

```typescript
// 抽出、変換、ロード
async function etlPipeline() {
  // 1. ソースから抽出
  const rawData = await extractFromPostgres()

  // 2. 変換
  const transformed = rawData.map(row => ({
    date: new Date(row.created_at).toISOString().split('T')[0],
    market_id: row.market_slug,
    volume: parseFloat(row.total_volume),
    trades: parseInt(row.trade_count)
  }))

  // 3. ClickHouseにロード
  await bulkInsertToClickHouse(transformed)
}

// 定期的に実行
setInterval(etlPipeline, 60 * 60 * 1000)  // 1時間ごと
```

### 変更データキャプチャ(CDC)

```typescript
// PostgreSQLの変更をリッスンしてClickHouseに同期
import { Client } from 'pg'

const pgClient = new Client({ connectionString: process.env.DATABASE_URL })

pgClient.query('LISTEN market_updates')

pgClient.on('notification', async (msg) => {
  const update = JSON.parse(msg.payload)

  await clickhouse.insert('market_updates', [
    {
      market_id: update.id,
      event_type: update.operation,  // INSERT, UPDATE, DELETE
      timestamp: new Date(),
      data: JSON.stringify(update.new_data)
    }
  ])
})
```

## ベストプラクティス

### 1. パーティショニング戦略
- 時間でパーティション化(通常は月または日)
- パーティションが多すぎないようにする(パフォーマンスへの影響)
- パーティションキーにはDATEタイプを使用

### 2. ソートキー
- 最も頻繁にフィルタリングされる列を最初に配置
- カーディナリティを考慮(高カーディナリティを最初に)
- 順序は圧縮に影響

### 3. データタイプ
- 最小の適切なタイプを使用(UInt32 vs UInt64)
- 繰り返される文字列にはLowCardinalityを使用
- カテゴリカルデータにはEnumを使用

### 4. 避けるべき
- SELECT *(列を指定)
- FINAL(代わりにクエリ前にデータをマージ)
- JOINが多すぎる(分析用に非正規化)
- 小さな頻繁な挿入(代わりにバッチ処理)

### 5. モニタリング
- クエリパフォーマンスを追跡
- ディスク使用量を監視
- マージ操作をチェック
- 低速クエリログをレビュー

**注意**: ClickHouseは分析ワークロードに優れています。クエリパターンに合わせてテーブルを設計し、挿入をバッチ化し、リアルタイム集計にはマテリアライズドビューを活用します。

Related Skills

data-scientist

31392
from sickn33/antigravity-awesome-skills

Expert data scientist for advanced analytics, machine learning, and statistical modeling. Handles complex data analysis, predictive modeling, and business intelligence.

Data AnalysisClaude

workspace-surface-audit

144923
from affaan-m/everything-claude-code

Audit the active repo, MCP servers, plugins, connectors, env surfaces, and harness setup, then recommend the highest-value ECC-native skills, hooks, agents, and operator workflows. Use when the user wants help setting up Claude Code or understanding what capabilities are actually available in their environment.

DevelopmentClaude

ui-demo

144923
from affaan-m/everything-claude-code

Record polished UI demo videos using Playwright. Use when the user asks to create a demo, walkthrough, screen recording, or tutorial video of a web application. Produces WebM videos with visible cursor, natural pacing, and professional feel.

Developer ToolsClaude

token-budget-advisor

144923
from affaan-m/everything-claude-code

Offers the user an informed choice about how much response depth to consume before answering. Use this skill when the user explicitly wants to control response length, depth, or token budget. TRIGGER when: "token budget", "token count", "token usage", "token limit", "response length", "answer depth", "short version", "brief answer", "detailed answer", "exhaustive answer", "respuesta corta vs larga", "cuántos tokens", "ahorrar tokens", "responde al 50%", "dame la versión corta", "quiero controlar cuánto usas", or clear variants where the user is explicitly asking to control answer size or depth. DO NOT TRIGGER when: user has already specified a level in the current session (maintain it), the request is clearly a one-word answer, or "token" refers to auth/session/payment tokens rather than response size.

Productivity & Content CreationClaude

skill-comply

144923
from affaan-m/everything-claude-code

Visualize whether skills, rules, and agent definitions are actually followed — auto-generates scenarios at 3 prompt strictness levels, runs agents, classifies behavioral sequences, and reports compliance rates with full tool call timelines

DevelopmentClaude

santa-method

144923
from affaan-m/everything-claude-code

Multi-agent adversarial verification with convergence loop. Two independent review agents must both pass before output ships.

Quality AssuranceClaude

safety-guard

144923
from affaan-m/everything-claude-code

Use this skill to prevent destructive operations when working on production systems or running agents autonomously.

DevelopmentClaude

repo-scan

144923
from affaan-m/everything-claude-code

Cross-stack source code asset audit — classifies every file, detects embedded third-party libraries, and delivers actionable four-level verdicts per module with interactive HTML reports.

DevelopmentClaude

project-flow-ops

144923
from affaan-m/everything-claude-code

Operate execution flow across GitHub and Linear by triaging issues and pull requests, linking active work, and keeping GitHub public-facing while Linear remains the internal execution layer. Use when the user wants backlog control, PR triage, or GitHub-to-Linear coordination.

DevelopmentClaude

product-lens

144923
from affaan-m/everything-claude-code

Use this skill to validate the "why" before building, run product diagnostics, and pressure-test product direction before the request becomes an implementation contract.

Product ManagementClaude

openclaw-persona-forge

144923
from affaan-m/everything-claude-code

为 OpenClaw AI Agent 锻造完整的龙虾灵魂方案。根据用户偏好或随机抽卡, 输出身份定位、灵魂描述(SOUL.md)、角色化底线规则、名字和头像生图提示词。 如当前环境提供已审核的生图 skill,可自动生成统一风格头像图片。 当用户需要创建、设计或定制 OpenClaw 龙虾灵魂时使用。 不适用于:微调已有 SOUL.md、非 OpenClaw 平台的角色设计、纯工具型无性格 Agent。 触发词:龙虾灵魂、虾魂、OpenClaw 灵魂、养虾灵魂、龙虾角色、龙虾定位、 龙虾剧本杀角色、龙虾游戏角色、龙虾 NPC、龙虾性格、龙虾背景故事、 lobster soul、lobster character、抽卡、随机龙虾、龙虾 SOUL、gacha。

AI Tools & UtilitiesClaude

manim-video

144923
from affaan-m/everything-claude-code

Build reusable Manim explainers for technical concepts, graphs, system diagrams, and product walkthroughs, then hand off to the wider ECC video stack if needed. Use when the user wants a clean animated explainer rather than a generic talking-head script.

DevelopmentClaude