hbase

Apache HBase wide-column store on Hadoop. Use for big data.

7 stars

byG1Joshi

View on GitHub Installation ↓

Best use case

hbase is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Apache HBase wide-column store on Hadoop. Use for big data.

Teams using hbase should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/hbase/SKILL.md --create-dirs "https://raw.githubusercontent.com/G1Joshi/Agent-Skills/main/skills/databases/hbase/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/hbase/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How hbase Compares

Feature / Agent	hbase	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Apache HBase wide-column store on Hadoop. Use for big data.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Apache HBase

HBase is the Hadoop database. It is a distributed, scalable, big data store. It provides random, real-time read/write access to your Big Data.

## When to Use

- **Hadoop Ecosystem**: Deep integration with HDFS, Hive, Spark.
- **Petabyte Scale**: Serving billions of rows with low latency.
- **Random Access**: When you need random R/W on HDFS data (which is usually WORM - Write Once Read Many).

## Quick Start

Uses Java API or Shell.

```bash
create 'users', 'info', 'data'
put 'users', 'row1', 'info:name', 'Alice'
get 'users', 'row1'
```

## Core Concepts

### Column Families

Data is grouped into column families (`info:name`, `info:email`). Families are stored physically together.

### Region Servers

HBase scales by splitting tables into "Regions" and hosting them on Region Servers.

### WAL & MemStore

Writes go to Write-Ahead-Log (Disk) and MemStore (RAM). When MemStore fills, it flushes to HFile (HDFS).

## Best Practices (2025)

**Do**:

- **Design Row Keys carefully**: Row keys determine sorting and sharding. "Hotspotting" (sequential keys) is the enemy. Use salt or hashing.
- **Pre-split Regions**: Don't start with 1 region. Pre-split based on your known key distribution.
- **Use Phoenix**: Apache Phoenix provides a SQL skin over HBase, making it usable like a Relational DB.

**Don't**:

- **Don't use for small data**: The overhead of HDFS/ZimeKeeper/HBase is huge. Only for >TB scale.
- **Don't scan excessively**: Full table scans are MapReduce jobs.

## References

- [Apache HBase Reference Guide](https://hbase.apache.org/book.html)