clickhouse-reference-architecture
Production reference architecture for ClickHouse-backed applications — project layout, data flow, multi-tenant patterns, and operational topology. Use when designing new ClickHouse systems, reviewing architecture, or establishing standards for ClickHouse integrations. Trigger: "clickhouse architecture", "clickhouse project structure", "clickhouse design", "clickhouse multi-tenant", "clickhouse reference".
Best use case
clickhouse-reference-architecture is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Production reference architecture for ClickHouse-backed applications — project layout, data flow, multi-tenant patterns, and operational topology. Use when designing new ClickHouse systems, reviewing architecture, or establishing standards for ClickHouse integrations. Trigger: "clickhouse architecture", "clickhouse project structure", "clickhouse design", "clickhouse multi-tenant", "clickhouse reference".
Teams using clickhouse-reference-architecture should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/clickhouse-reference-architecture/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How clickhouse-reference-architecture Compares
| Feature / Agent | clickhouse-reference-architecture | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Production reference architecture for ClickHouse-backed applications — project layout, data flow, multi-tenant patterns, and operational topology. Use when designing new ClickHouse systems, reviewing architecture, or establishing standards for ClickHouse integrations. Trigger: "clickhouse architecture", "clickhouse project structure", "clickhouse design", "clickhouse multi-tenant", "clickhouse reference".
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# ClickHouse Reference Architecture
## Overview
Production-grade architecture for ClickHouse analytics platforms covering
project layout, data flow, multi-tenancy, and operational patterns.
## Prerequisites
- Understanding of ClickHouse fundamentals (engines, ORDER BY, partitioning)
- TypeScript/Node.js project
## Instructions
### Step 1: Project Structure
```
my-analytics-platform/
├── src/
│ ├── clickhouse/
│ │ ├── client.ts # Singleton client with health checks
│ │ ├── schemas/ # SQL DDL files (source of truth)
│ │ │ ├── 001-events.sql
│ │ │ ├── 002-users.sql
│ │ │ └── 003-materialized-views.sql
│ │ ├── queries/ # Named query functions
│ │ │ ├── events.ts
│ │ │ ├── users.ts
│ │ │ └── dashboards.ts
│ │ └── migrations/ # Schema migrations
│ │ ├── runner.ts
│ │ └── 001-add-country.sql
│ ├── ingestion/
│ │ ├── webhook-receiver.ts # HTTP webhook endpoint
│ │ ├── kafka-consumer.ts # Kafka consumer (if applicable)
│ │ └── buffer.ts # Insert batching buffer
│ ├── api/
│ │ ├── routes.ts # API endpoints
│ │ └── middleware.ts # Auth, rate limiting
│ └── jobs/
│ ├── daily-rollup.ts # Scheduled aggregations
│ └── cleanup.ts # TTL enforcement
├── tests/
│ ├── unit/
│ └── integration/
├── docker-compose.yml # Local ClickHouse
├── init-db/ # Docker init scripts
└── config/
├── development.env
├── staging.env
└── production.env
```
### Step 2: Data Flow Architecture
```
┌─────────────────┐
│ Data Sources │
│ (Webhooks, API, │
│ Kafka, S3) │
└────────┬────────┘
│
┌────────▼────────┐
│ Ingestion Layer │
│ (Buffer + Batch │
│ 10K+ rows/ins) │
└────────┬────────┘
│
┌──────────────▼──────────────┐
│ ClickHouse Server │
│ │
│ ┌────────────────────────┐ │
│ │ Raw Event Tables │ │
│ │ (MergeTree, append) │ │
│ └───────────┬────────────┘ │
│ │ │
│ ┌───────────▼────────────┐ │
│ │ Materialized Views │ │
│ │ (Auto-aggregate on │ │
│ │ INSERT — hourly, │ │
│ │ daily, tenant-level) │ │
│ └───────────┬────────────┘ │
│ │ │
│ ┌───────────▼────────────┐ │
│ │ Aggregate Tables │ │
│ │ (AggregatingMergeTree)│ │
│ └────────────────────────┘ │
└──────────────┬──────────────┘
│
┌────────▼────────┐
│ API Layer │
│ (Query aggregate│
│ tables, not │
│ raw events) │
└────────┬────────┘
│
┌────────▼────────┐
│ Dashboards / │
│ Client Apps │
└─────────────────┘
```
### Step 3: Schema Design (3-Layer Pattern)
```sql
-- Layer 1: Raw events (append-only, full fidelity)
CREATE TABLE analytics.events_raw (
event_id UUID DEFAULT generateUUIDv4(),
tenant_id UInt32,
event_type LowCardinality(String),
user_id UInt64,
properties String CODEC(ZSTD(3)),
created_at DateTime64(3) DEFAULT now64(3)
)
ENGINE = MergeTree()
ORDER BY (tenant_id, event_type, toDate(created_at), user_id)
PARTITION BY toYYYYMM(created_at)
TTL created_at + INTERVAL 90 DAY;
-- Layer 2: Hourly aggregation (auto-populated via materialized view)
CREATE TABLE analytics.events_hourly (
hour DateTime,
tenant_id UInt32,
event_type LowCardinality(String),
cnt UInt64,
users AggregateFunction(uniq, UInt64)
)
ENGINE = AggregatingMergeTree()
ORDER BY (tenant_id, event_type, hour);
CREATE MATERIALIZED VIEW analytics.events_hourly_mv TO analytics.events_hourly AS
SELECT toStartOfHour(created_at) AS hour, tenant_id, event_type,
count() AS cnt, uniqState(user_id) AS users
FROM analytics.events_raw GROUP BY hour, tenant_id, event_type;
-- Layer 3: Daily rollup for dashboards
CREATE TABLE analytics.events_daily (
date Date,
tenant_id UInt32,
total UInt64,
users AggregateFunction(uniq, UInt64)
)
ENGINE = AggregatingMergeTree()
ORDER BY (tenant_id, date);
CREATE MATERIALIZED VIEW analytics.events_daily_mv TO analytics.events_daily AS
SELECT toDate(created_at) AS date, tenant_id,
count() AS total, uniqState(user_id) AS users
FROM analytics.events_raw GROUP BY date, tenant_id;
```
### Step 4: Multi-Tenant Patterns
**Approach A: Shared table with tenant_id in ORDER BY (recommended)**
```sql
-- Tenant_id first in ORDER BY = queries filter on tenant efficiently
ORDER BY (tenant_id, event_type, created_at)
-- Query: only scans data for this tenant
SELECT count() FROM events_raw WHERE tenant_id = 42;
```
**Approach B: Database per tenant (for strict isolation)**
```sql
CREATE DATABASE tenant_42;
CREATE TABLE tenant_42.events (...) ENGINE = MergeTree() ...;
-- Pros: Full isolation, easy to drop tenant
-- Cons: Schema changes need per-tenant DDL, more operational overhead
```
**Approach C: Row-level security (ClickHouse RBAC)**
```sql
CREATE ROW POLICY tenant_isolation ON analytics.events_raw
FOR SELECT USING tenant_id = getSetting('custom_tenant_id')
TO app_user;
```
### Step 5: Client Module
```typescript
// src/clickhouse/client.ts
import { createClient, ClickHouseClient } from '@clickhouse/client';
let instance: ClickHouseClient | null = null;
export function getClient(): ClickHouseClient {
if (!instance) {
instance = createClient({
url: process.env.CLICKHOUSE_HOST!,
username: process.env.CLICKHOUSE_USER!,
password: process.env.CLICKHOUSE_PASSWORD!,
database: process.env.CLICKHOUSE_DATABASE ?? 'analytics',
max_open_connections: Number(process.env.CH_MAX_CONNECTIONS ?? 10),
request_timeout: 30_000,
compression: { request: true, response: true },
});
}
return instance;
}
// src/clickhouse/queries/dashboards.ts
export async function getTenantDashboard(tenantId: number, days = 30) {
const client = getClient();
const rs = await client.query({
query: `
SELECT date, sum(total) AS events, uniqMerge(users) AS unique_users
FROM analytics.events_daily
WHERE tenant_id = {tid:UInt32} AND date >= today() - {days:UInt32}
GROUP BY date ORDER BY date
`,
query_params: { tid: tenantId, days },
format: 'JSONEachRow',
});
return rs.json<{ date: string; events: string; unique_users: string }>();
}
```
## Architecture Decision Records
| Decision | Choice | Why |
|----------|--------|-----|
| Engine | MergeTree (raw) + AggregatingMergeTree (rollups) | Best for append + pre-agg |
| Multi-tenant | Shared table + tenant_id in ORDER BY | Scales to 10K+ tenants |
| Ingestion | Buffer + batch INSERT | Avoids "too many parts" |
| Aggregation | Materialized views (not cron) | Real-time, zero-lag |
| Format | JSONEachRow | Client support, debugging |
| Compression | ZSTD(3) for strings, Delta for ints | 10-20x compression |
## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Cross-tenant data leak | Missing WHERE tenant_id | Use row policies or middleware |
| Stale dashboard data | MV not created | Verify MV exists and is attached |
| Schema drift | Manual DDL changes | Use migration runner |
| Slow dashboard queries | Querying raw table | Query aggregate tables instead |
## Resources
- [ClickHouse Architecture](https://clickhouse.com/docs/development/architecture)
- [SharedMergeTree (Cloud)](https://clickhouse.com/docs/cloud/reference/shared-merge-tree)
- [Materialized Views](https://clickhouse.com/blog/using-materialized-views-in-clickhouse)
## Next Steps
For multi-environment configuration, see `clickhouse-multi-env-setup`.Related Skills
exa-reference-architecture
Implement Exa reference architecture for search pipelines, RAG, and content discovery. Use when designing new Exa integrations, reviewing project structure, or establishing architecture standards for neural search applications. Trigger with phrases like "exa architecture", "exa project structure", "exa RAG pipeline", "exa reference design", "exa search pipeline".
exa-architecture-variants
Choose and implement Exa architecture patterns at different scales: direct search, cached search, and RAG pipeline. Use when designing Exa integrations, choosing between simple search and full RAG, or planning architecture for different traffic volumes. Trigger with phrases like "exa architecture", "exa blueprint", "how to structure exa", "exa RAG design", "exa at scale".
evernote-reference-architecture
Reference architecture for Evernote integrations. Use when designing system architecture, planning integrations, or building scalable Evernote applications. Trigger with phrases like "evernote architecture", "design evernote system", "evernote integration pattern", "evernote scale".
elevenlabs-reference-architecture
Implement ElevenLabs reference architecture for production TTS/voice applications. Use when designing new ElevenLabs integrations, reviewing project structure, or building a scalable audio generation service. Trigger: "elevenlabs architecture", "elevenlabs project structure", "how to organize elevenlabs", "TTS service architecture", "elevenlabs design patterns", "voice API architecture".
documenso-reference-architecture
Implement Documenso reference architecture with best-practice project layout. Use when designing new Documenso integrations, reviewing project structure, or establishing architecture standards for document signing applications. Trigger with phrases like "documenso architecture", "documenso best practices", "documenso project structure", "how to organize documenso".
deepgram-reference-architecture
Implement Deepgram reference architecture for scalable transcription systems. Use when designing transcription pipelines, building production architectures, or planning Deepgram integration at scale. Trigger: "deepgram architecture", "transcription pipeline", "deepgram system design", "deepgram at scale", "enterprise deepgram", "deepgram queue".
databricks-reference-architecture
Implement Databricks reference architecture with best-practice project layout. Use when designing new Databricks projects, reviewing architecture, or establishing standards for Databricks applications. Trigger with phrases like "databricks architecture", "databricks best practices", "databricks project structure", "how to organize databricks", "databricks layout".
customerio-reference-architecture
Implement Customer.io enterprise reference architecture. Use when designing integration layers, event-driven architectures, or enterprise-grade Customer.io setups. Trigger: "customer.io architecture", "customer.io design", "customer.io enterprise", "customer.io integration pattern".
cursor-reference-architecture
Reference architecture for Cursor IDE projects: directory structure, rules organization, indexing strategy, and team configuration patterns. Triggers on "cursor architecture", "cursor project structure", "cursor best practices", "cursor file structure".
coreweave-reference-architecture
Reference architecture for CoreWeave GPU cloud deployments. Use when designing ML infrastructure, planning multi-model serving, or establishing CoreWeave deployment standards. Trigger with phrases like "coreweave architecture", "coreweave design", "coreweave infrastructure", "coreweave best practices".
configuration-reference-generator
Configuration Reference Generator - Auto-activating skill for Technical Documentation. Triggers on: configuration reference generator, configuration reference generator Part of the Technical Documentation skill category.
cohere-reference-architecture
Implement Cohere reference architecture with layered project layout for RAG and agents. Use when designing new Cohere integrations, reviewing project structure, or establishing architecture standards for Cohere API v2 applications. Trigger with phrases like "cohere architecture", "cohere project structure", "cohere layout", "organize cohere app", "cohere design pattern".