nosql-expert

Expert guidance for distributed NoSQL databases (Cassandra, DynamoDB). Focuses on mental models, query-first modeling, single-table design, and avoiding hot partitions in high-scale systems.

6 stars

bynetbarros

View on GitHub Installation ↓

Best use case

nosql-expert is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Expert guidance for distributed NoSQL databases (Cassandra, DynamoDB). Focuses on mental models, query-first modeling, single-table design, and avoiding hot partitions in high-scale systems.

Teams using nosql-expert should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/nosql-expert/SKILL.md --create-dirs "https://raw.githubusercontent.com/netbarros/psique/main/.codex/skills/nosql-expert/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/nosql-expert/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How nosql-expert Compares

Feature / Agent	nosql-expert	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Expert guidance for distributed NoSQL databases (Cassandra, DynamoDB). Focuses on mental models, query-first modeling, single-table design, and avoiding hot partitions in high-scale systems.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# NoSQL Expert Patterns (Cassandra & DynamoDB)

## Overview

This skill provides professional mental models and design patterns for **distributed wide-column and key-value stores** (specifically Apache Cassandra and Amazon DynamoDB).

Unlike SQL (where you model data entities), or document stores (like MongoDB), these distributed systems require you to **model your queries first**.

## When to Use

- **Designing for Scale**: Moving beyond simple single-node databases to distributed clusters.
- **Technology Selection**: Evaluating or using **Cassandra**, **ScyllaDB**, or **DynamoDB**.
- **Performance Tuning**: Troubleshooting "hot partitions" or high latency in existing NoSQL systems.
- **Microservices**: Implementing "database-per-service" patterns where highly optimized reads are required.

## The Mental Shift: SQL vs. Distributed NoSQL

| Feature | SQL (Relational) | Distributed NoSQL (Cassandra/DynamoDB) |
| :--- | :--- | :--- |
| **Data modeling** | Model Entities + Relationships | Model **Queries** (Access Patterns) |
| **Joins** | CPU-intensive, at read time | **Pre-computed** (Denormalized) at write time |
| **Storage cost** | Expensive (minimize duplication) | Cheap (duplicate data for read speed) |
| **Consistency** | ACID (Strong) | **BASE (Eventual)** / Tunable |
| **Scalability** | Vertical (Bigger machine) | **Horizontal** (More nodes/shards) |

> **The Golden Rule:** In SQL, you design the data model to answer *any* query. In NoSQL, you design the data model to answer *specific* queries efficiently.

## Core Design Patterns

### 1. Query-First Modeling (Access Patterns)

You typically cannot "add a query later" without migration or creating a new table/index.

**Process:**
1.  **List all Entities** (User, Order, Product).
2.  **List all Access Patterns** ("Get User by Email", "Get Orders by User sorted by Date").
3.  **Design Table(s)** specifically to serve those patterns with a single lookup.

### 2. The Partition Key is King

Data is distributed across physical nodes based on the **Partition Key (PK)**.
-   **Goal:** Even distribution of data and traffic.
-   **Anti-Pattern:** Using a low-cardinality PK (e.g., `status="active"` or `gender="m"`) creates **Hot Partitions**, limiting throughput to a single node's capacity.
-   **Best Practice:** Use high-cardinality keys (User IDs, Device IDs, Composite Keys).

### 3. Clustering / Sort Keys

Within a partition, data is sorted on disk by the **Clustering Key (Cassandra)** or **Sort Key (DynamoDB)**.
-   This allows for efficient **Range Queries** (e.g., `WHERE user_id=X AND date > Y`).
-   It effectively pre-sorts your data for specific retrieval requirements.

### 4. Single-Table Design (Adjacency Lists)

*Primary use: DynamoDB (but concepts apply elsewhere)*

Storing multiple entity types in one table to enable pre-joined reads.

| PK (Partition) | SK (Sort) | Data Fields... |
| :--- | :--- | :--- |
| `USER#123` | `PROFILE` | `{ name: "Ian", email: "..." }` |
| `USER#123` | `ORDER#998` | `{ total: 50.00, status: "shipped" }` |
| `USER#123` | `ORDER#999` | `{ total: 12.00, status: "pending" }` |

-   **Query:** `PK="USER#123"`
-   **Result:** Fetches User Profile AND all Orders in **one network request**.

### 5. Denormalization & Duplication

Don't be afraid to store the same data in multiple tables to serve different query patterns.
-   **Table A:** `users_by_id` (PK: uuid)
-   **Table B:** `users_by_email` (PK: email)

*Trade-off: You must manage data consistency across tables (often using eventual consistency or batch writes).*

## Specific Guidance

### Apache Cassandra / ScyllaDB

-   **Primary Key Structure:** `((Partition Key), Clustering Columns)`
-   **No Joins, No Aggregates:** Do not try to `JOIN` or `GROUP BY`. Pre-calculate aggregates in a separate counter table.
-   **Avoid `ALLOW FILTERING`:** If you see this in production, your data model is wrong. It implies a full cluster scan.
-   **Writes are Cheap:** Inserts and Updates are just appends to the LSM tree. Don't worry about write volume as much as read efficiency.
-   **Tombstones:** Deletes are expensive markers. Avoid high-velocity delete patterns (like queues) in standard tables.

### AWS DynamoDB

-   **GSI (Global Secondary Index):** Use GSIs to create alternative views of your data (e.g., "Search Orders by Date" instead of by User).
    -   *Note:* GSIs are eventually consistent.
-   **LSI (Local Secondary Index):** Sorts data differently *within* the same partition. Must be created at table creation time.
-   **WCU / RCU:** Understand capacity modes. Single-table design helps optimize consumed capacity units.
-   **TTL:** Use Time-To-Live attributes to automatically expire old data (free delete) without creating tombstones.

## Expert Checklist

Before finalizing your NoSQL schema:

-   [ ] **Access Pattern Coverage:** Does every query pattern map to a specific table or index?
-   [ ] **Cardinality Check:** Does the Partition Key have enough unique values to spread traffic evenly?
-   [ ] **Split Partition Risk:** For any single partition (e.g., a single user's orders), will it grow indefinitely? (If > 10GB, you need to "shard" the partition, e.g., `USER#123#2024-01`).
-   [ ] **Consistency Requirement:** Can the application tolerate eventual consistency for this read pattern?

## Common Anti-Patterns

❌ **Scatter-Gather:** Querying *all* partitions to find one item (Scan).
❌ **Hot Keys:** Putting all "Monday" data into one partition.
❌ **Relational Modeling:** Creating `Author` and `Book` tables and trying to join them in code. (Instead, embed Book summaries in Author, or duplicate Author info in Books).

Related Skills

typescript-expert

from netbarros/psique

TypeScript and JavaScript expert with deep knowledge of type-level programming, performance optimization, monorepo management, migration strategies, and modern tooling.

threat-modeling-expert

from netbarros/psique

Expert in threat modeling methodologies, security architecture review, and risk assessment. Masters STRIDE, PASTA, attack trees, and security requirement extraction. Use for security architecture r...

swiftui-expert-skill

from netbarros/psique

Write, review, or improve SwiftUI code following best practices for state management, view composition, performance, modern APIs, Swift concurrency, and iOS 26+ Liquid Glass adoption. Use when buil...

service-mesh-expert

from netbarros/psique

Expert service mesh architect specializing in Istio, Linkerd, and cloud-native networking patterns. Masters traffic management, security policies, observability integration, and multi-cluster mesh con

prisma-expert

from netbarros/psique

Prisma ORM expert for schema design, migrations, query optimization, relations modeling, and database operations. Use PROACTIVELY for Prisma schema issues, migration problems, query performance, re...

nestjs-expert

from netbarros/psique

Nest.js framework expert specializing in module architecture, dependency injection, middleware, guards, interceptors, testing with Jest/Supertest, TypeORM/Mongoose integration, and Passport.js auth...

n8n-mcp-tools-expert

from netbarros/psique

Expert guide for using n8n-mcp MCP tools effectively. Use when searching for nodes, validating configurations, accessing templates, managing workflows, or using any n8n-mcp tool. Provides tool sele...

mermaid-expert

from netbarros/psique

Create Mermaid diagrams for flowcharts, sequences, ERDs, and architectures. Masters syntax for all diagram types and styling.

laravel-expert

from netbarros/psique

Senior Laravel Engineer role for production-grade, maintainable, and idiomatic Laravel solutions. Focuses on clean architecture, security, performance, and modern standards (Laravel 10/11+).

kotlin-coroutines-expert

from netbarros/psique

Expert patterns for Kotlin Coroutines and Flow, covering structured concurrency, error handling, and testing.

flutter-expert

from netbarros/psique

Master Flutter development with Dart 3, advanced widgets, and multi-platform deployment.

docker-expert

from netbarros/psique

Docker containerization expert with deep knowledge of multi-stage builds, image optimization, container security, Docker Compose orchestration, and production deployment patterns. Use PROACTIVELY f...