modeling-nosql-data

Build use when you need to work with NoSQL data modeling. This skill provides NoSQL database design with comprehensive guidance and automation. Trigger with phrases like "model NoSQL data", "design document structure", or "optimize NoSQL schema".

1,868 stars

byjeremylongshore

View on GitHub Installation ↓

Best use case

modeling-nosql-data is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using modeling-nosql-data should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/modeling-nosql-data/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/database/nosql-data-modeler/skills/modeling-nosql-data/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/modeling-nosql-data/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How modeling-nosql-data Compares

Feature / Agent	modeling-nosql-data	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

SKILL.md Source

# NoSQL Data Modeler

## Overview

Design data models for NoSQL databases including MongoDB (document), DynamoDB (key-value/wide-column), Redis (key-value), and Cassandra (wide-column). Unlike relational modeling where normalization drives design, NoSQL modeling starts from access patterns and query requirements, then shapes the data to serve those patterns efficiently.

## Prerequisites

- `mongosh`, `aws dynamodb` CLI, `redis-cli`, or `cqlsh` installed depending on target database
- Documented list of application access patterns (read/write queries the application performs)
- Expected data volumes (document count, average document size, growth rate)
- Read/write ratio and latency requirements for each access pattern
- Understanding of consistency requirements (strong vs. eventual consistency)

## Instructions

1. Catalog all application access patterns as a table with columns: pattern name, query description, frequency (queries/sec), latency requirement, and data fields accessed. This drives every modeling decision.

2. For MongoDB document modeling, apply the embedding vs. referencing decision framework:
- **Embed** when: data is always accessed together, child data has no independent lifecycle, cardinality is bounded (1:few), and updates are infrequent.
- **Reference** when: data has independent access patterns, cardinality is unbounded (1:many/many:many), child documents are large, or data is shared across parents.

3. Design document schemas that match query patterns. If the application needs "all orders for a customer with line items," embed line items inside the order document. If the application needs "all products across all orders," use references to a products collection.

4. For DynamoDB, design the partition key and sort key to support the primary access pattern with a single-table design. Use composite sort keys (e.g., `ORDER#2024-01-15#12345`) for hierarchical data. Plan GSIs (Global Secondary Indexes) for secondary access patterns, keeping total GSI count under 5.

5. Evaluate denormalization trade-offs: duplicating data across documents reduces read latency but increases write complexity and storage. Denormalize data that changes rarely (user names, product categories) but reference data that changes frequently (prices, inventory counts).

6. Handle one-to-many relationships by choosing between embedding (small arrays), child referencing (parent stores child IDs), or parent referencing (child stores parent ID). For unbounded one-to-many, always use parent referencing to avoid document size limits (16MB in MongoDB).

7. Model many-to-many relationships using an array of references in each document or a dedicated junction collection. For DynamoDB, use adjacency list patterns with inverted GSIs.

8. Plan for schema evolution by using schema versioning fields (`schemaVersion: 2`), writing migration scripts that update documents in batches, and ensuring application code handles both old and new document shapes during rollout.

9. Validate the model against access patterns by running sample queries with `explain()` in MongoDB or examining consumed capacity units in DynamoDB. Verify that primary access patterns require only single-partition reads.

10. Document the final data model with sample documents, index definitions, and the access pattern mapping that justifies each modeling decision.

## Output

- **Data model diagrams** showing document/collection structure, embedded vs. referenced relationships
- **Sample documents** in JSON format for each collection/table with realistic data
- **Index definitions** including compound indexes, partial indexes, and TTL indexes
- **Access pattern mapping** table linking each query to its supporting collection and index
- **Migration scripts** for evolving schemas from existing relational models to NoSQL

## Error Handling

| Error | Cause | Solution |
|-------|-------|---------|
| Document exceeds 16MB size limit (MongoDB) | Unbounded array growth from embedding too many child documents | Switch from embedding to referencing; use the bucket pattern to chunk large arrays into fixed-size sub-documents |
| Hot partition in DynamoDB | Partition key with low cardinality causes uneven distribution | Add a random suffix or use a composite key; distribute writes across partitions with write sharding |
| High read latency on referenced documents | Too many round trips to resolve references (N+1 query problem) | Denormalize frequently accessed reference data; use `$lookup` aggregation for server-side joins; batch reference resolution |
| Inconsistent denormalized data | Write to source succeeds but denormalized copies not updated | Implement change streams (MongoDB) or DynamoDB Streams to propagate updates; use transactional writes where supported |
| Query requires full collection scan | Missing index on query filter fields | Create compound indexes matching query predicates and sort order; use `explain()` to verify index usage |

## Examples

**E-commerce product catalog in MongoDB**: Products embed variant arrays (size, color, price) since variants are always accessed with the product. Reviews reference the product by ID since reviews are accessed independently and grow unboundedly. A compound index on `{category: 1, price: 1}` supports filtered browsing.

**Social media feed in DynamoDB single-table design**: Partition key is `USER#userId`, sort key is `POST#timestamp` for user timeline queries. A GSI with partition key `HASHTAG#tag` and sort key `timestamp` supports hashtag feeds. User profile data uses sort key `PROFILE` on the same partition.

**IoT sensor data in Cassandra**: Partition key is `sensor_id`, clustering column is `timestamp DESC`. Each partition holds one sensor's readings, ordered by time. TTL of 90 days automatically expires old readings. Materialized views support queries by location and sensor type.

## Resources

- MongoDB data modeling patterns: https://www.mongodb.com/docs/manual/data-modeling/
- DynamoDB single-table design: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql.html
- Cassandra data modeling guide: https://cassandra.apache.org/doc/latest/cassandra/data_modeling/
- Redis data structures: https://redis.io/docs/data-types/
- NoSQL design patterns catalog: https://www.mongodb.com/docs/manual/applications/data-models/

Related Skills

generating-test-data

1868

from jeremylongshore/claude-code-plugins-plus-skills

Generate realistic test data including edge cases and boundary conditions. Use when creating realistic fixtures or edge case test data. Trigger with phrases like "generate test data", "create fixtures", or "setup test database".

managing-database-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Test database testing including fixtures, transactions, and rollback management. Use when performing specialized testing. Trigger with phrases like "test the database", "run database tests", or "validate data integrity".

encrypting-and-decrypting-data

1868

from jeremylongshore/claude-code-plugins-plus-skills

Validate encryption implementations and cryptographic practices. Use when reviewing data security measures. Trigger with 'check encryption', 'validate crypto', or 'review security keys'.

scanning-for-data-privacy-issues

1868

from jeremylongshore/claude-code-plugins-plus-skills

Scan for data privacy issues and sensitive information exposure. Use when reviewing data handling practices. Trigger with 'scan privacy issues', 'check sensitive data', or 'validate data protection'.

windsurf-data-handling

1868

from jeremylongshore/claude-code-plugins-plus-skills

Control what code and data Windsurf AI can access and process in your workspace. Use when handling sensitive data, implementing data exclusion patterns, or ensuring compliance with privacy regulations in Windsurf environments. Trigger with phrases like "windsurf data privacy", "windsurf PII", "windsurf GDPR", "windsurf compliance", "codeium data", "windsurf telemetry".

webflow-data-handling

1868

from jeremylongshore/claude-code-plugins-plus-skills

Implement Webflow data handling — CMS content delivery patterns, PII redaction in form submissions, GDPR/CCPA compliance for ecommerce data, and data retention policies. Trigger with phrases like "webflow data", "webflow PII", "webflow GDPR", "webflow data retention", "webflow privacy", "webflow CCPA", "webflow forms data".

vercel-data-handling

1868

from jeremylongshore/claude-code-plugins-plus-skills

Implement data handling, PII protection, and GDPR/CCPA compliance for Vercel deployments. Use when handling sensitive data in serverless functions, implementing data redaction, or ensuring privacy compliance on Vercel. Trigger with phrases like "vercel data", "vercel PII", "vercel GDPR", "vercel data retention", "vercel privacy", "vercel compliance".

veeva-data-handling

1868

from jeremylongshore/claude-code-plugins-plus-skills

Veeva Vault data handling for enterprise operations. Use when implementing advanced Veeva Vault patterns. Trigger: "veeva data handling".

vastai-data-handling

1868

from jeremylongshore/claude-code-plugins-plus-skills

Manage training data and model artifacts securely on Vast.ai GPU instances. Use when transferring data to instances, managing checkpoints, or implementing secure data lifecycle on rented hardware. Trigger with phrases like "vastai data", "vastai upload data", "vastai checkpoints", "vastai data security", "vastai artifacts".

twinmind-data-handling

1868

from jeremylongshore/claude-code-plugins-plus-skills

Handle TwinMind meeting data with GDPR compliance: transcript storage, memory vault management, data export, and deletion policies. Use when implementing data handling, or managing TwinMind meeting AI operations. Trigger with phrases like "twinmind data handling", "twinmind data handling".

supabase-data-handling

1868

from jeremylongshore/claude-code-plugins-plus-skills

Implement GDPR/CCPA compliance with Supabase: RLS for data isolation, user deletion via auth.admin.deleteUser(), data export via SQL, PII column management, backup/restore workflows, and retention policies. Use when handling sensitive data, implementing right-to-deletion, configuring data retention, or auditing PII in Supabase database columns. Trigger: "supabase GDPR", "supabase data handling", "supabase PII", "supabase compliance", "supabase data retention", "supabase delete user", "supabase data export".

speak-data-handling

1868

from jeremylongshore/claude-code-plugins-plus-skills

Handle student audio data, assessment records, and learning progress with GDPR/COPPA compliance. Use when implementing data handling, or managing Speak language learning platform operations. Trigger with phrases like "speak data handling", "speak data handling".