clickhouse

ClickHouse columnar database for analytics. Use for real-time analytics.

7 stars

Best use case

clickhouse is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

ClickHouse columnar database for analytics. Use for real-time analytics.

Teams using clickhouse should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/clickhouse/SKILL.md --create-dirs "https://raw.githubusercontent.com/G1Joshi/Agent-Skills/main/skills/databases/clickhouse/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/clickhouse/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How clickhouse Compares

Feature / AgentclickhouseStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

ClickHouse columnar database for analytics. Use for real-time analytics.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# ClickHouse

ClickHouse is a columnar DBMS for Online Analytical Processing (OLAP). It is famous for allowing real-time generation of analytical reports using SQL queries on petabytes of data.

## When to Use

- **Real-time Analytics**: User-facing dashboards (Google Analytics style).
- **Log Management**: A cheaper, faster alternative to Elasticsearch/Splunk for logs (Observability).
- **Huge Throughput**: Ingesting millions of rows per second.

## Quick Start

```sql
SELECT
    toStartOfHour(EventTime) as Hour,
    count(),
    avg(Duration)
FROM events
GROUP BY Hour
ORDER BY Hour
```

## Core Concepts

### MergeTree Engine

The default table engine. Features primary keys (for sorting/skipping), data partitioning, and background data replication.

### Columnar Storage

Stores columns separately. If you select 5 columns out of 100, it only reads those 5 files.

### Vectorized Execution

Processes data in blocks (Vectors), maximizing CPU cache and SIMD usage.

## Best Practices (2025)

**Do**:

- **Insert in Batches**: Never insert row-by-row. Batch at least 1,000 rows.
- **Use Materialized Views**: ClickHouse MVs function like insert triggers. They calculate aggregations _on write_.
- **Use LowCardinality**: A data type key for strings with few unique values (Country, OS).

**Don't**:

- **Don't use it for OLTP**: No real transactions, updates/deletes are "mutations" (heavy async background processes).
- **Don't use standard joins for massive tables**: Use dictionaries or `JOIN` carefully (Right table must fit in RAM or use distributed join).

## References

- [ClickHouse Documentation](https://clickhouse.com/docs)