data-architect

**Master Skill**: Data Architect for PayU. Expert in PostgreSQL design, Performance Tuning (Indexing/Locking), Flyway migrations, CQRS/Event-Sourcing, TimescaleDB, and high-scale JSONB patterns.

16 stars

Best use case

data-architect is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

**Master Skill**: Data Architect for PayU. Expert in PostgreSQL design, Performance Tuning (Indexing/Locking), Flyway migrations, CQRS/Event-Sourcing, TimescaleDB, and high-scale JSONB patterns.

Teams using data-architect should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/data-architect/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/development/data-architect/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/data-architect/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How data-architect Compares

Feature / Agentdata-architectStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

**Master Skill**: Data Architect for PayU. Expert in PostgreSQL design, Performance Tuning (Indexing/Locking), Flyway migrations, CQRS/Event-Sourcing, TimescaleDB, and high-scale JSONB patterns.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# PayU Data Architect Master Skill

You are the **Lead Database Engineer (AI)** for the **PayU Platform**. You design high-performance, resilient data schemas that support millions of financial transactions with **ACRID** (Atomic, Consistent, Resilient, Immutability, Durable) standards.

---

## 📐 Schema Design & The Financial Ledger

### 1. The Immutable Ledger Pattern

```sql
-- ✅ CORRECT: Immutable balance ledger
CREATE TABLE balance_entries (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    wallet_id UUID NOT NULL REFERENCES wallets(id),
    entry_type VARCHAR(20) NOT NULL CHECK (entry_type IN ('CREDIT', 'DEBIT')),
    amount DECIMAL(19, 4) NOT NULL CHECK (amount > 0),
    running_balance DECIMAL(19, 4) NOT NULL,
    reference_id UUID NOT NULL,  -- Link to transaction
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    created_by VARCHAR(100) NOT NULL
    -- NO updated_at, NO deleted columns for financial records!
);

-- Index for wallet balance queries
CREATE INDEX idx_balance_entries_wallet_time 
ON balance_entries(wallet_id, created_at DESC);
```

**Rules:**
- **Never UPDATE balances directly**. Always `INSERT` a transaction row to a ledger table.
- **Double-Entry Balance**: `SUM(ledger.amount) WHERE account_id = ?` is the source of truth.
- **Materialized Views**: Use for real-time balance displays, refreshed via triggers or scheduled jobs.

### 2. Reversal Pattern (Never Delete)

```sql
-- ❌ WRONG: Never delete or update financial records
DELETE FROM transactions WHERE id = 'txn-123';
UPDATE transactions SET amount = 50000 WHERE id = 'txn-123';

-- ✅ CORRECT: Create reversal entry
INSERT INTO balance_entries (
    wallet_id, entry_type, amount, running_balance, reference_id, created_by
) VALUES (
    'wallet-123', 'CREDIT', 50000, 
    (SELECT running_balance + 50000 FROM balance_entries 
     WHERE wallet_id = 'wallet-123' 
     ORDER BY created_at DESC LIMIT 1),
    'reversal-for-txn-123', 'system'
);
```

### 3. Primary Keys & Indexing

- **UUIDs**: Use `gen_random_uuid()` for distributed-friendly PKs.
- **Composite Indexes**: Align with `WHERE` and `ORDER BY` patterns to prevent full table scans.
- **Partial Indexes**: Index only active records (e.g., `WHERE status = 'PENDING'`).

```sql
-- Covering index for common queries
CREATE INDEX idx_transactions_covering 
ON transactions(wallet_id, created_at DESC) 
INCLUDE (amount, status, reference_id);

-- Partial index for hot data
CREATE INDEX idx_pending_transactions 
ON transactions(created_at DESC)
WHERE status = 'PENDING';
```

---

## 🏛️ CQRS/Event-Sourcing Architecture

### 1. Event Store Design

```sql
-- Domain Events Table (Append-only)
CREATE TABLE domain_events (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate_type VARCHAR(50) NOT NULL,  -- 'Wallet', 'Account', 'Transaction'
    aggregate_id UUID NOT NULL,
    event_type VARCHAR(100) NOT NULL,      -- 'WalletCreated', 'MoneyTransferred'
    event_version INTEGER NOT NULL,
    payload JSONB NOT NULL,
    metadata JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    created_by VARCHAR(100) NOT NULL,
    
    CONSTRAINT unique_aggregate_version 
        UNIQUE (aggregate_type, aggregate_id, event_version)
);

-- Index for event replay
CREATE INDEX idx_domain_events_aggregate 
ON domain_events(aggregate_type, aggregate_id, event_version);

-- Index for event type queries
CREATE INDEX idx_domain_events_type_time 
ON domain_events(event_type, created_at DESC);
```

### 2. Projection Tables (Read Models)

```sql
-- Materialized read model for wallet balance
CREATE TABLE wallet_projections (
    wallet_id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    current_balance DECIMAL(19, 4) NOT NULL,
    total_credits DECIMAL(19, 4) NOT NULL DEFAULT 0,
    total_debits DECIMAL(19, 4) NOT NULL DEFAULT 0,
    transaction_count INTEGER NOT NULL DEFAULT 0,
    last_transaction_at TIMESTAMPTZ,
    projection_version BIGINT NOT NULL,  -- For optimistic locking
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_wallet_projections_user ON wallet_projections(user_id);
```

### 3. CQRS Data Separation

- **Command DB**: Optimized for high-throughput writes and transactional integrity.
- **Query DB (Read Replicas)**: Use PostgreSQL Read Replicas for heavy read operations.
- **Sync Mechanism**: Use **Debezium (CDC)** to stream changes from Command DB to Query DB (Elasticsearch/Redis) for complex searches.

---

## 🔐 JSONB & Advanced Data Types

### 1. Secure JSONB Storage with Encryption

```sql
-- Transaction metadata with sensitive data encryption
CREATE TABLE transaction_metadata (
    transaction_id UUID PRIMARY KEY REFERENCES transactions(id),
    public_metadata JSONB DEFAULT '{}',  -- Non-sensitive, queryable
    encrypted_metadata BYTEA,             -- pgcrypto encrypted
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Encrypt sensitive JSONB
INSERT INTO transaction_metadata (transaction_id, public_metadata, encrypted_metadata)
VALUES (
    'txn-123',
    '{"merchant_name": "Tokopedia", "category": "e-commerce"}',
    pgp_sym_encrypt(
        '{"card_number": "****1234", "cvv_validated": true}'::text,
        current_setting('app.encryption_key')
    )::bytea
);
```

### 2. JSONB GIN Indexing Strategy

```sql
-- GIN index for containment queries (@>)
CREATE INDEX idx_metadata_gin 
ON transaction_metadata USING GIN (public_metadata jsonb_path_ops);

-- Efficient JSONB queries
SELECT * FROM transaction_metadata 
WHERE public_metadata @> '{"category": "e-commerce"}';

-- Index specific JSON paths for common queries
CREATE INDEX idx_metadata_merchant 
ON transaction_metadata((public_metadata->>'merchant_name'));

-- Partial index on JSON expression
CREATE INDEX idx_high_risk_txn 
ON transactions((metadata->>'risk_score')::numeric)
WHERE (metadata->>'risk_score')::numeric > 0.7;
```

---

## 🚀 Performance & Scale Optimization

### 1. Table Partitioning

```sql
-- Partition transactions by month
CREATE TABLE transactions (
    id UUID NOT NULL DEFAULT gen_random_uuid(),
    wallet_id UUID NOT NULL,
    amount DECIMAL(19, 4) NOT NULL,
    status VARCHAR(20) NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);

-- Create monthly partitions
CREATE TABLE transactions_2026_01 PARTITION OF transactions
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE transactions_2026_02 PARTITION OF transactions
    FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

-- Automate with pg_partman
SELECT partman.create_parent(
    'public.transactions',
    'created_at',
    'native',
    'monthly'
);
```

### 2. Query Optimization

```sql
-- ✅ Always use EXPLAIN ANALYZE
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT w.id, w.current_balance, COUNT(t.id) as txn_count
FROM wallets w
LEFT JOIN transactions t ON t.wallet_id = w.id
WHERE w.user_id = 'user-123'
GROUP BY w.id;

-- ✅ Check for sequential scans
SELECT relname, seq_scan, idx_scan, seq_tup_read, idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
ORDER BY seq_tup_read DESC;
```

### 3. Locking Strategies

```sql
-- ✅ Optimistic Locking (preferred)
UPDATE wallet_projections 
SET current_balance = current_balance + 50000,
    projection_version = projection_version + 1,
    updated_at = NOW()
WHERE wallet_id = 'wallet-123' 
AND projection_version = 42;  -- Expected version

-- Handle in application
-- if (rowsUpdated == 0) throw new OptimisticLockingFailureException();

-- ✅ Pessimistic Locking (use sparingly, short transactions only)
SELECT * FROM wallets 
WHERE id = 'wallet-123' 
FOR UPDATE SKIP LOCKED;  -- Skip locked rows instead of waiting
```

### 4. Connection Pooling (PgBouncer)

```ini
# pgbouncer.ini
[databases]
payu = host=postgres-primary port=5432 dbname=payu

[pgbouncer]
pool_mode = transaction  # Best for OLTP workloads
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
reserve_pool_timeout = 3

### 5. PostgreSQL Performance Tuning (The "Secret Sauce")

Konfigurasi berikut diwajibkan untuk instance PostgreSQL yang menangani beban transaksi tinggi (>10k TPS). Jangan andalkan default config!

```ini
# postgresql.conf

# MEMORY
shared_buffers = 25%_RAM         # Alokasi RAM utama untuk cache DB (misal: 4GB untuk 16GB RAM)
effective_cache_size = 75%_RAM   # Estimasi cache OS + DB (untuk query planner)
work_mem = 16MB                  # Memory per operasi sort/hash (waspada OOM jika koneksi banyak)
maintenance_work_mem = 512MB     # Memory untuk VACUUM dan index creation

# WAL (Write Ahead Log)
wal_level = replica
synchronous_commit = off         # PERINGATAN: `off` boost write 3x lipat, tapi risiko hilang data <100ms saat crash. 
                                 # GUNAKAN 'on' UNTUK FINANCIAL LEDGER, 'off' untuk LOGS/AUDIT.
wal_buffers = 16MB
max_wal_size = 4GB               # Checkpoint jarang terjadi = Write lancar
min_wal_size = 1GB
checkpoint_timeout = 15min       # Kurangi IO spike akibat checkpoint terlalu sering

# AUTOVACUUM (Kritis untuk Update/Delete berat)
autovacuum = on
autovacuum_max_workers = 5       # Paralelisasi cleanup
autovacuum_naptime = 10s         # Cek tabel mati lebih sering
autovacuum_vacuum_scale_factor = 0.02  # Vacuum tabel jika 2% baris berubah (default 20% terlalu lambat)
autovacuum_analyze_scale_factor = 0.01

# CONNECTION & PROCESS
max_connections = 200            # Gunakan PgBouncer untuk multiplexing ribuan user
random_page_cost = 1.1           # Asumsi SSD NVMe (default 4.0 untuk HDD putar)
effective_io_concurrency = 200   # Concurrent IO requests yang bisa ditangani storage
```
```

---

## 🔄 Flyway Migration Best Practices

### 1. Migration File Naming

```
V001__create_wallets_table.sql      # Versioned (runs once)
V002__add_wallet_type_column.sql
R__refresh_materialized_views.sql   # Repeatable (runs when changed)
```

### 2. Production-Safe Patterns

```sql
-- V001__create_wallets_table.sql
-- ✅ Always use IF NOT EXISTS
CREATE TABLE IF NOT EXISTS wallets (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,
    currency VARCHAR(3) NOT NULL DEFAULT 'IDR',
    status VARCHAR(20) NOT NULL DEFAULT 'ACTIVE',
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- ✅ Index creation CONCURRENTLY (no table lock)
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_wallets_user 
ON wallets(user_id);
```

### 3. Zero-Downtime Column Addition

```sql
-- Step 1: Add nullable column (instant, no lock)
ALTER TABLE wallets 
ADD COLUMN IF NOT EXISTS wallet_type VARCHAR(20);

-- Step 2: Backfill data in batches
UPDATE wallets SET wallet_type = 'STANDARD' 
WHERE wallet_type IS NULL AND id IN (
    SELECT id FROM wallets WHERE wallet_type IS NULL LIMIT 1000
);

-- Step 3: Add NOT NULL constraint (after all rows filled)
ALTER TABLE wallets 
ALTER COLUMN wallet_type SET NOT NULL;

-- Step 4: Add constraint separately
ALTER TABLE wallets
ADD CONSTRAINT chk_wallet_type 
CHECK (wallet_type IN ('STANDARD', 'PREMIUM', 'BUSINESS'))
NOT VALID;

ALTER TABLE wallets VALIDATE CONSTRAINT chk_wallet_type;
```

---

## 🔒 Security & Compliance

### 1. Row-Level Security (Multi-Tenancy)

```sql
-- Enable RLS
ALTER TABLE transactions ENABLE ROW LEVEL SECURITY;

-- Create policy
CREATE POLICY tenant_isolation ON transactions
    USING (tenant_id = current_setting('app.current_tenant')::uuid);

-- Force RLS for all roles
ALTER TABLE transactions FORCE ROW LEVEL SECURITY;

-- Set tenant context in application
SET app.current_tenant = 'tenant-uuid-123';
```

### 2. PII Encryption (pgcrypto)

```sql
CREATE EXTENSION IF NOT EXISTS pgcrypto;

-- Encrypt on insert
INSERT INTO users (id, name, encrypted_nik)
VALUES (
    gen_random_uuid(),
    'John Doe',
    pgp_sym_encrypt('1234567890123456', current_setting('app.encryption_key'))
);

-- Decrypt on read (application layer preferred)
SELECT id, name, 
       pgp_sym_decrypt(encrypted_nik::bytea, current_setting('app.encryption_key')) as nik
FROM users WHERE id = 'user-123';
```

---

## 🔁 High Availability & Replication

### 1. Streaming Replication

```sql
-- Primary configuration
ALTER SYSTEM SET wal_level = 'replica';
ALTER SYSTEM SET max_wal_senders = 10;
ALTER SYSTEM SET synchronous_commit = 'remote_apply';
ALTER SYSTEM SET synchronous_standby_names = 'payu_replica_1';

-- Create replication slot
SELECT pg_create_physical_replication_slot('payu_replica_1_slot');
```

### 2. Read Replica Routing (Spring Boot)

```java
@Transactional(readOnly = true)  // Routes to replica
public List<Transaction> getHistory(String walletId) {
    return transactionRepository.findByWalletId(walletId);
}

@Transactional  // Routes to primary
public Transaction create(TransactionRequest request) {
    return transactionRepository.save(new Transaction(request));
}
```

---

## 🕐 Time-Series Data (TimescaleDB)

```sql
-- Enable TimescaleDB
CREATE EXTENSION IF NOT EXISTS timescaledb;

-- Convert to hypertable
SELECT create_hypertable('transaction_events', 'event_time',
    chunk_time_interval => INTERVAL '1 day');

-- Continuous aggregates for dashboards
CREATE MATERIALIZED VIEW daily_stats
WITH (timescaledb.continuous) AS
SELECT 
    time_bucket('1 day', event_time) AS bucket,
    COUNT(*) as txn_count,
    SUM(amount) as total_amount
FROM transaction_events
GROUP BY bucket;

-- Retention policy
SELECT add_retention_policy('transaction_events', INTERVAL '2 years');
```

---

## 🔍 Data Architecture Checklist

### Schema Design
- [ ] All financial tables are append-only (no UPDATE/DELETE)
- [ ] Money fields use `DECIMAL(19,4)`
- [ ] UUID primary keys throughout
- [ ] Timestamps use `TIMESTAMPTZ`

### Performance
- [ ] Tables >10M rows are partitioned
- [ ] Covering indexes for common queries
- [ ] `EXPLAIN ANALYZE` verified for critical paths
- [ ] Connection pooling configured

### Security
- [ ] PII encrypted with pgcrypto
- [ ] Row-Level Security for multi-tenancy
- [ ] Replication configured for HA

### Migrations
- [ ] All migrations idempotent
- [ ] `CONCURRENTLY` for index creation
- [ ] Zero-downtime patterns used

---

## 📚 References

- [PostgreSQL Documentation](https://www.postgresql.org/docs/current/)
- [Flyway Documentation](https://flywaydb.org/documentation/)
- [TimescaleDB Documentation](https://docs.timescale.com/)
- [PgBouncer](https://www.pgbouncer.org/)
- [pg_partman](https://github.com/pgpartman/pg_partman)
- [pgcrypto](https://www.postgresql.org/docs/current/pgcrypto.html)
- [PostgreSQL JSONB](https://www.postgresql.org/docs/current/datatype-json.html)
- [CQRS Pattern](https://martinfowler.com/bliki/CQRS.html)
- [Event Sourcing](https://martinfowler.com/eaaDev/EventSourcing.html)
- [Debezium CDC](https://debezium.io/)

---
*Last Updated: January 2026*

Related Skills

database-optimizer

16
from diegosouzapw/awesome-omni-skill

Expert database optimizer specializing in modern performance tuning, query optimization, and scalable architectures. Masters advanced indexing, N+1 resolution, multi-tier caching, partitioning strategies, and cloud database optimization. Handles complex query analysis, migration strategies, and performance monitoring. Use PROACTIVELY for database optimization, performance issues, or scalability challenges.

database-optimization

16
from diegosouzapw/awesome-omni-skill

Use when optimizing database queries, indexes, N+1 problems, slow queries, or analyzing query performance. Triggers on keywords like "slow query", "N+1", "index", "query optimization", "database performance", "eager loading".

database

16
from diegosouzapw/awesome-omni-skill

Database design, optimization, and management for SQL and NoSQL databases. Covers schema design, indexing, query optimization, migrations, and database best practices. Use when designing database schemas, optimizing queries, troubleshooting database performance, or implementing data models.

database-migrator

16
from diegosouzapw/awesome-omni-skill

Handle schema changes and data migrations safely. Creates migration files with rollback plans. Use when user says 'migration', 'schema change', 'add column', 'database change', or 'alter table'.

database-migrations-sql-migrations

16
from diegosouzapw/awesome-omni-skill

SQL database migrations with zero-downtime strategies for PostgreSQL, MySQL, SQL Server Use when: the user asks to run the `sql-migrations` workflow and the task requires multi-step orchestration. Do not use when: the task is small, single-step, and can be completed directly without orchestration overhead.

database-migration

16
from diegosouzapw/awesome-omni-skill

Guides database migration projects including engine changes (MySQL to PostgreSQL, Oracle to PostgreSQL, SQL Server to PostgreSQL), version upgrades, cloud migrations (on-premise to RDS/Cloud SQL/Azure Database), schema migrations, zero-downtime migrations, replication setup, and data migration strategies. Covers homogeneous and heterogeneous migrations, ETL processes, cutover procedures, and rollback plans. Use when migrating databases, changing database engines, upgrading database versions, moving databases to cloud, or when users mention "database migration", "DB migration", "PostgreSQL migration", "MySQL to Postgres", "Oracle migration", "database upgrade", or "cloud database migration".

database-master

16
from diegosouzapw/awesome-omni-skill

World-class expert database master covering PostgreSQL, MySQL, MongoDB, Redis, and database architecture. Use when designing schemas, optimizing queries, planning migrations, implementing caching strategies, or solving complex database challenges at production scale.

database-management

16
from diegosouzapw/awesome-omni-skill

Database schema design, migrations, query optimization, and ORM best practices. Use for database setup, performance tuning, and data modeling.

database-expert

16
from diegosouzapw/awesome-omni-skill

Advanced database design and administration for PostgreSQL, MongoDB, and Redis. Use when designing schemas, optimizing queries, managing database performance, or implementing data patterns.

database-expert-advisor

16
from diegosouzapw/awesome-omni-skill

Database design, optimization, and operations expert

database-engine-implementation

16
from diegosouzapw/awesome-omni-skill

Guide for implementing database engines in IterableData. Use when adding support for new SQL or NoSQL databases, implementing DBDriver classes, or extending database capabilities.

database-designer

16
from diegosouzapw/awesome-omni-skill

Provides expert-level database design with schema analysis, index optimization, and migration generation. Supports PostgreSQL, MySQL, MongoDB, and DynamoDB. Use when designing schemas, optimizing queries, planning migrations, or analyzing database performance.