realtime-analytics
Build real-time analytics pipelines from scratch. Use when someone asks to "set up analytics", "build a dashboard", "track events in real time", "ClickHouse analytics", "event ingestion pipeline", or "live metrics". Covers event schema design, ingestion services with batching, ClickHouse table optimization, aggregation queries, and dashboard wiring.
Best use case
realtime-analytics is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Build real-time analytics pipelines from scratch. Use when someone asks to "set up analytics", "build a dashboard", "track events in real time", "ClickHouse analytics", "event ingestion pipeline", or "live metrics". Covers event schema design, ingestion services with batching, ClickHouse table optimization, aggregation queries, and dashboard wiring.
Teams using realtime-analytics should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/realtime-analytics/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How realtime-analytics Compares
| Feature / Agent | realtime-analytics | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Build real-time analytics pipelines from scratch. Use when someone asks to "set up analytics", "build a dashboard", "track events in real time", "ClickHouse analytics", "event ingestion pipeline", or "live metrics". Covers event schema design, ingestion services with batching, ClickHouse table optimization, aggregation queries, and dashboard wiring.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Real-Time Analytics
## Overview
This skill enables AI agents to build self-hosted, real-time analytics systems. It covers the full pipeline from event ingestion through storage to query and visualization, using ClickHouse as the analytical database for sub-second query performance at scale.
## Instructions
### Event Schema Design
1. Every event must have these base fields:
- `event_name` — LowCardinality(String) for efficient storage
- `timestamp` — DateTime64(3) for millisecond precision
- `session_id` — String, client-generated UUID
- `user_id` — Nullable(String) for anonymous tracking
- `device_type` — LowCardinality(String): desktop, mobile, tablet
- `country_code` — LowCardinality(FixedString(2))
- `properties` — String containing JSON for event-specific data
2. ClickHouse table optimization rules:
- Use `MergeTree()` engine, partition by `toYYYYMM(date)`
- ORDER BY should start with the most filtered column (usually `event_name`)
- Add TTL for automatic data expiration (default 90 days)
- Use `LowCardinality()` for any string column with fewer than 10,000 distinct values
### Ingestion Service
1. Build as a stateless HTTP service accepting `POST /events` with JSON array body.
2. Validate incoming events: reject if `event_name` or `timestamp` is missing.
3. Buffer events in memory. Flush when either condition is met:
- Buffer reaches 1,000 events
- 2 seconds have elapsed since last flush
4. Use ClickHouse's `INSERT ... FORMAT JSONEachRow` for batch inserts.
5. On flush failure, retry 3 times with exponential backoff, then write to a dead-letter file.
6. Expose `GET /health` returning: `{ "buffer_size": N, "last_flush": "ISO timestamp", "status": "ok" }`.
### Aggregation Queries
Write queries as named `.sql` files. Common dashboard panels:
**Active users (last N minutes):**
```sql
SELECT count(DISTINCT session_id) AS active_users
FROM events
WHERE timestamp > now() - INTERVAL 5 MINUTE;
```
**Funnel analysis (two-step):**
```sql
SELECT
toStartOfHour(timestamp) AS hour,
countIf(event_name = 'signup_started') AS started,
countIf(event_name = 'signup_completed') AS completed,
round(completed / started * 100, 1) AS conversion_pct
FROM events
WHERE date = today()
GROUP BY hour
ORDER BY hour;
```
**Top features by unique users:**
```sql
SELECT
JSONExtractString(properties, 'feature') AS feature,
count(DISTINCT user_id) AS unique_users
FROM events
WHERE event_name = 'feature_used' AND date = today()
GROUP BY feature
ORDER BY unique_users DESC
LIMIT 10;
```
**Error rate per minute:**
```sql
SELECT
toStartOfMinute(timestamp) AS minute,
count() AS error_count
FROM events
WHERE event_name = 'error_occurred'
AND timestamp > now() - INTERVAL 1 HOUR
GROUP BY minute
ORDER BY minute;
```
### Dashboard API Layer
1. Create one REST endpoint per dashboard panel (e.g., `GET /api/dashboard/active-users`).
2. Each endpoint runs its SQL query and returns JSON.
3. Add a `time_range` query param: `1h`, `6h`, `24h`, `7d`. Adjust the WHERE clause accordingly.
4. Cache results for 3-5 seconds to avoid hammering ClickHouse on concurrent dashboard loads.
### Docker Compose Setup
Provide a `docker-compose.yml` with:
- ClickHouse service (pinned version, volume for data)
- Ingestion service (builds from Dockerfile)
- Dashboard frontend (builds from Dockerfile or served via nginx)
- An init container or entrypoint script that creates the events table
## Examples
### Example 1 — Schema for an e-commerce app
**Input:** "Design analytics events for an e-commerce checkout flow."
**Output:**
```sql
-- Core events: product_viewed, add_to_cart, checkout_started,
-- payment_submitted, order_completed, order_failed
CREATE TABLE events (
event_name LowCardinality(String),
timestamp DateTime64(3),
session_id String,
user_id Nullable(String),
device_type LowCardinality(String),
country_code LowCardinality(FixedString(2)),
properties String,
date Date DEFAULT toDate(timestamp)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (event_name, timestamp)
TTL date + INTERVAL 180 DAY;
```
Properties JSON per event:
- `product_viewed`: `{"product_id": "p-42", "category": "electronics", "price_cents": 4999}`
- `add_to_cart`: `{"product_id": "p-42", "quantity": 1}`
- `payment_submitted`: `{"method": "card", "amount_cents": 9998}`
### Example 2 — Ingestion buffer logic
**Input:** "Show the buffer flush logic in Node.js."
**Output:**
```js
const FLUSH_INTERVAL_MS = 2000;
const FLUSH_SIZE = 1000;
let buffer = [];
async function addEvents(events) {
buffer.push(...events);
if (buffer.length >= FLUSH_SIZE) await flush();
}
setInterval(async () => {
if (buffer.length > 0) await flush();
}, FLUSH_INTERVAL_MS);
async function flush() {
const batch = buffer.splice(0, buffer.length);
const rows = batch.map(e => JSON.stringify(e)).join('\n');
await clickhouse.insert({
table: 'events',
values: batch,
format: 'JSONEachRow',
});
}
```
## Guidelines
- **Partition wisely.** Monthly partitions work for most apps. Daily only if you ingest > 100M events/day.
- **Do not use Kafka unless asked.** For under 50K events/second, direct HTTP ingestion with in-memory buffering is simpler and sufficient.
- **Always add TTL.** Unbounded analytical tables grow fast. Default to 90 days; let the user override.
- **Test with realistic volume.** Generate synthetic events to validate the pipeline handles expected throughput before going live.
- **Avoid SELECT *.** Always specify columns in aggregation queries to minimize I/O.Related Skills
realtime-database
When the user needs to design database schemas and queries optimized for real-time applications. Use when the user mentions "chat database," "message storage," "real-time sync," "message history," "unread count," "cursor pagination," "event sourcing," or "live data." Handles schema design for messaging, activity feeds, notifications, and collaborative apps with efficient pagination and sync. For WebSocket transport, see websocket-builder.
product-analytics
Expert guidance for product analytics, helping product teams define metrics, build funnels, analyze retention, run A/B tests, and make data-driven decisions. Applies frameworks for North Star metrics, pirate metrics (AARRR), cohort analysis, and experiment design.
openai-realtime
Build voice-enabled AI applications with the OpenAI Realtime API. Use when a user asks to implement real-time voice conversations, stream audio with WebSockets, build voice assistants, or integrate OpenAI audio capabilities.
analytics-tracking
When the user wants to set up, improve, or audit analytics tracking and measurement. Also use when the user mentions "set up tracking," "GA4," "Google Analytics," "conversion tracking," "event tracking," "UTM parameters," "tag manager," "GTM," "analytics implementation," or "tracking plan." For A/B test measurement, see ab-test-setup.
zustand
You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.
zoho
Integrate and automate Zoho products. Use when a user asks to work with Zoho CRM, Zoho Books, Zoho Desk, Zoho Projects, Zoho Mail, or Zoho Creator, build custom integrations via Zoho APIs, automate workflows with Deluge scripting, sync data between Zoho apps and external systems, manage leads and deals, automate invoicing, build custom Zoho Creator apps, set up webhooks, or manage Zoho organization settings. Covers Zoho CRM, Books, Desk, Projects, Creator, and cross-product integrations.
zod
You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.
zipkin
Deploy and configure Zipkin for distributed tracing and request flow visualization. Use when a user needs to set up trace collection, instrument Java/Spring or other services with Zipkin, analyze service dependencies, or configure storage backends for trace data.
zig
Expert guidance for Zig, the systems programming language focused on performance, safety, and readability. Helps developers write high-performance code with compile-time evaluation, seamless C interop, no hidden control flow, and no garbage collector. Zig is used for game engines, operating systems, networking, and as a C/C++ replacement.
zed
Expert guidance for Zed, the high-performance code editor built in Rust with native collaboration, AI integration, and GPU-accelerated rendering. Helps developers configure Zed, create custom extensions, set up collaborative editing sessions, and integrate AI assistants for productive coding.
zeabur
Expert guidance for Zeabur, the cloud deployment platform that auto-detects frameworks, builds and deploys applications with zero configuration, and provides managed services like databases and message queues. Helps developers deploy full-stack applications with automatic scaling and one-click marketplace services.
zapier
Automate workflows between apps with Zapier. Use when a user asks to connect apps without code, automate repetitive tasks, sync data between services, or build no-code integrations between SaaS tools.