openduck-distributed-duckdb

OpenDuck — open-source distributed DuckDB with differential storage, hybrid dual execution, and transparent remote database attach

22 stars

Best use case

openduck-distributed-duckdb is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

OpenDuck — open-source distributed DuckDB with differential storage, hybrid dual execution, and transparent remote database attach

Teams using openduck-distributed-duckdb should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/openduck-distributed-duckdb/SKILL.md --create-dirs "https://raw.githubusercontent.com/Aradotso/trending-skills/main/skills/openduck-distributed-duckdb/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/openduck-distributed-duckdb/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How openduck-distributed-duckdb Compares

Feature / Agentopenduck-distributed-duckdbStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

OpenDuck — open-source distributed DuckDB with differential storage, hybrid dual execution, and transparent remote database attach

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# OpenDuck Distributed DuckDB

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection

OpenDuck is an open-source implementation of distributed DuckDB featuring differential storage (append-only immutable layers via Postgres + object store), hybrid dual execution (single queries split across local and remote workers), and transparent remote database attach via `ATTACH 'openduck:mydb'`. It is architecturally inspired by MotherDuck but fully open protocol (gRPC + Arrow IPC).

---

## Architecture Overview

```
DuckDB client (openduck extension)
  └─ ATTACH 'openduck:mydb?endpoint=...' AS cloud
       └─ gRPC + Arrow IPC
            └─ Gateway (Rust)
                 ├─ auth / routing / plan splitting
                 ├─ Worker 1 (embedded DuckDB)
                 └─ Worker N (embedded DuckDB)
                      ├─ Postgres (metadata)
                      └─ Object store (sealed layers)
```

**Key concepts:**
- `OpenDuckCatalog` / `OpenDuckTableEntry` — remote tables appear as first-class DuckDB catalog entries
- Hybrid execution — gateway labels operators `LOCAL` or `REMOTE`, inserts `Bridge` operators at boundaries
- Differential storage — immutable sealed layers, snapshot isolation, one write path, many readers
- Protocol — only 2 gRPC RPCs defined in `proto/openduck/v1/execution.proto`

---

## Repository Layout

```
crates/
  exec-gateway/     # auth, routing, hybrid plan splitting
  exec-worker/      # embedded DuckDB, Arrow IPC streaming
  exec-proto/       # protobuf/tonic codegen
  openduck-cli/     # unified CLI (serve|gateway|worker)
  diff-*/           # differential storage (layers, metadata, FUSE)

extensions/
  openduck/         # DuckDB C++ extension (StorageExtension + Catalog)

clients/
  python/           # pip-installable openduck wrapper

proto/
  openduck/v1/      # execution.proto
```

---

## Installation & Build

### Prerequisites

- Rust toolchain (stable)
- C++ build tools, `vcpkg`, `bison` (macOS: `brew install bison`)
- DuckDB development headers (handled by the extension Makefile)

### 1. Build the Rust backend

```bash
git clone https://github.com/CITGuru/openduck
cd openduck
cargo build --workspace
```

### 2. Build the DuckDB C++ extension

```bash
cd extensions/openduck
make
# Output:
# extensions/openduck/build/release/extension/openduck/openduck.duckdb_extension
```

### 3. Install the Python client (optional)

```bash
pip install -e clients/python
```

---

## Running the Server

```bash
# Required env vars
export OPENDUCK_TOKEN=your-secret-token

# Start all-in-one (gateway + worker)
cargo run -p openduck-cli -- serve -d mydb -t $OPENDUCK_TOKEN

# Or run gateway and worker separately
cargo run -p openduck-cli -- gateway --port 7878
cargo run -p openduck-cli -- worker --gateway http://localhost:7878
```

---

## Connecting from Python

### Via openduck wrapper (recommended)

```python
import openduck  # auto-detects extension from build tree or OPENDUCK_EXTENSION_PATH

con = openduck.connect("mydb")  # uses OPENDUCK_TOKEN env var
con.sql("SELECT 1 AS x").show()
con.sql("SELECT * FROM cloud.users LIMIT 10").show()
```

### Via raw DuckDB SDK

```python
import duckdb
import os

ext_path = os.environ["OPENDUCK_EXTENSION_PATH"]
# e.g. extensions/openduck/build/release/extension/openduck/openduck.duckdb_extension

con = duckdb.connect(config={"allow_unsigned_extensions": "true"})
con.execute(f"LOAD '{ext_path}';")
con.execute(
    "ATTACH 'openduck:mydb"
    "?endpoint=http://localhost:7878"
    f"&token={os.environ[\"OPENDUCK_TOKEN\"]}' AS cloud;"
)

# Query remote table
con.sql("SELECT * FROM cloud.users LIMIT 10").show()

# Hybrid query — local table joined with remote table
con.sql("""
    SELECT l.product_id, l.name, r.total_sales
    FROM local.products l
    JOIN cloud.sales r ON l.product_id = r.product_id
    WHERE r.total_sales > 1000
""").show()
```

### Environment variables

```bash
export OPENDUCK_TOKEN=your-secret-token
export OPENDUCK_EXTENSION_PATH=extensions/openduck/build/release/extension/openduck/openduck.duckdb_extension
export OPENDUCK_ENDPOINT=http://localhost:7878   # default
```

---

## Connecting from the CLI

```bash
duckdb -unsigned -c "
  LOAD 'extensions/openduck/build/release/extension/openduck/openduck.duckdb_extension';
  ATTACH 'openduck:mydb?endpoint=http://localhost:7878&token=${OPENDUCK_TOKEN}' AS cloud;
  SHOW ALL TABLES;
  SELECT * FROM cloud.users LIMIT 5;
"
```

---

## Connecting from Rust

```rust
use duckdb::{Connection, Result};

fn main() -> Result<()> {
    let conn = Connection::open_in_memory()?;
    let ext_path = std::env::var("OPENDUCK_EXTENSION_PATH").unwrap();
    let token = std::env::var("OPENDUCK_TOKEN").unwrap();

    conn.execute_batch(&format!(r#"
        SET allow_unsigned_extensions = true;
        LOAD '{ext_path}';
        ATTACH 'openduck:mydb?endpoint=http://localhost:7878&token={token}' AS cloud;
    "#))?;

    let mut stmt = conn.prepare("SELECT * FROM cloud.users LIMIT 10")?;
    let rows = stmt.query_map([], |row| {
        Ok(row.get::<_, String>(0)?)
    })?;

    for row in rows {
        println!("{}", row?);
    }
    Ok(())
}
```

---

## Hybrid Execution Pattern

Hybrid execution happens automatically — the gateway splits the logical plan:

```
[LOCAL]  HashJoin(l.id = r.id)
  [LOCAL]  Scan(products)       ← runs on your machine
  [LOCAL]  Bridge(REMOTE→LOCAL)
    [REMOTE] Scan(sales)        ← runs on remote worker
```

Write queries naturally — the extension handles routing:

```python
# This single query runs across two engines transparently
con.sql("""
    SELECT
        p.category,
        SUM(s.amount) AS revenue
    FROM local.products p          -- local table
    JOIN cloud.sales s             -- remote table
      ON p.id = s.product_id
    GROUP BY p.category
    ORDER BY revenue DESC
""").show()
```

---

## Differential Storage

Differential storage is managed server-side. Key properties:
- **Append-only sealed layers** stored in object storage (S3-compatible)
- **Postgres** stores layer metadata and snapshot pointers
- **Snapshot isolation** — readers always see a consistent view
- **One serialized write path** — many concurrent readers

From a client perspective it is fully transparent. DuckDB sees normal table semantics.

---

## ATTACH URL Reference

```
openduck:<database_name>?endpoint=<url>&token=<token>
```

| Parameter  | Default                    | Description                        |
|------------|----------------------------|------------------------------------|
| `endpoint` | `http://localhost:7878`    | Gateway URL                        |
| `token`    | `$OPENDUCK_TOKEN` env var  | Auth token matching server config  |

Examples:
```sql
-- Local dev
ATTACH 'openduck:mydb?token=dev-token' AS cloud;

-- Remote server, explicit endpoint
ATTACH 'openduck:mydb?endpoint=https://my-server.example.com&token=prod-token' AS cloud;

-- Alias: od: also works
ATTACH 'od:mydb?endpoint=http://localhost:7878&token=dev-token' AS cloud;
```

---

## DuckLake Integration

OpenDuck and DuckLake operate at different layers and complement each other:

```python
import duckdb, os

ext_path = os.environ["OPENDUCK_EXTENSION_PATH"]
token = os.environ["OPENDUCK_TOKEN"]

con = duckdb.connect(config={"allow_unsigned_extensions": "true"})
con.execute(f"LOAD '{ext_path}';")

# Attach DuckLake catalog (managed by remote worker backed by DuckLake)
con.execute(f"ATTACH 'openduck:lakehouse?endpoint=http://localhost:7878&token={token}' AS lh;")

# Query DuckLake tables transparently via OpenDuck transport
con.sql("SELECT * FROM lh.events WHERE event_date = today()").show()

# Hybrid: local scratch data joined with remote DuckLake table
con.sql("""
    SELECT l.session_id, r.user_email
    FROM memory.sessions l
    JOIN lh.users r ON l.user_id = r.id
""").show()
```

---

## Protocol Reference

The wire protocol is intentionally minimal. See `proto/openduck/v1/execution.proto`:

- **`ExecuteQuery`** — send SQL, receive a query handle
- **`StreamResults`** — stream Arrow IPC record batches back to client

Any gRPC service implementing these two RPCs is a valid OpenDuck backend. You can replace the Rust gateway with a custom implementation in any language.

---

## Common Patterns

### Check which tables are available remotely

```sql
-- After ATTACH ... AS cloud
SHOW ALL TABLES;
SELECT table_name FROM information_schema.tables WHERE table_schema = 'cloud';
```

### Write to a remote table

```sql
INSERT INTO cloud.events SELECT * FROM read_parquet('local_dump.parquet');
```

### Create a remote table from local data

```sql
CREATE TABLE cloud.new_table AS SELECT * FROM local_csv LIMIT 0;
INSERT INTO cloud.new_table SELECT * FROM local_csv;
```

### Export remote query result to local Parquet

```python
con.sql("SELECT * FROM cloud.large_table WHERE region = 'us-east'") \
   .write_parquet("output/us_east.parquet")
```

---

## Troubleshooting

### `Extension is not trusted` / signature error

```python
# Always set allow_unsigned_extensions before loading
con = duckdb.connect(config={"allow_unsigned_extensions": "true"})
```

Or in CLI:
```bash
duckdb -unsigned
```

### `LOAD` fails — extension not found

```bash
# Set the env var to the exact built path
export OPENDUCK_EXTENSION_PATH=$(pwd)/extensions/openduck/build/release/extension/openduck/openduck.duckdb_extension
ls -la $OPENDUCK_EXTENSION_PATH   # confirm it exists
```

### Connection refused to gateway

```bash
# Verify server is running
cargo run -p openduck-cli -- serve -d mydb -t $OPENDUCK_TOKEN
# Default port is 7878 — check firewall / port binding
curl http://localhost:7878/health
```

### Token mismatch / auth failure

```bash
# Server token and client token must match exactly
export OPENDUCK_TOKEN=same-value-on-both-sides
# Server: cargo run -p openduck-cli -- serve -d mydb -t $OPENDUCK_TOKEN
# Client: ATTACH '...&token=same-value-on-both-sides' AS cloud;
```

### Build fails on macOS — bison version

```bash
brew install bison
export PATH="$(brew --prefix bison)/bin:$PATH"
cd extensions/openduck && make
```

### Extension version mismatch with DuckDB

The extension must be built against the same DuckDB version as the Python package:

```bash
python -c "import duckdb; print(duckdb.__version__)"
# Ensure the extension Makefile targets the same version
# Check extensions/openduck/Makefile for DUCKDB_VERSION
```

---

## OpenDuck vs Alternatives

| Feature | OpenDuck | Arrow Flight SQL | DuckLake |
|---|---|---|---|
| Remote attach UX | `ATTACH 'openduck:db'` | Separate driver | `ATTACH 'ducklake:...'` |
| Hybrid execution | ✅ split plan | ❌ full remote | ❌ |
| DuckDB catalog integration | ✅ native | ❌ | ✅ |
| Protocol RPCs | 2 | ~15 | N/A |
| Differential storage | ✅ | ❌ | via Parquet layers |
| Self-hosted | ✅ | ✅ | ✅ |

Related Skills

```markdown

22
from Aradotso/trending-skills

---

zeroboot-vm-sandbox

22
from Aradotso/trending-skills

Sub-millisecond VM sandboxes for AI agents using copy-on-write KVM forking via Zeroboot

yourvpndead-vpn-detection

22
from Aradotso/trending-skills

Android app that detects VPN/proxy servers (VLESS/xray/sing-box) via local SOCKS5 vulnerability, exposing exit IPs and server configs without root

xata-postgres-platform

22
from Aradotso/trending-skills

Expert skill for Xata open-source cloud-native Postgres platform with copy-on-write branching, scale-to-zero, and Kubernetes deployment

x-mentor-skill-nuwa

22
from Aradotso/trending-skills

AI-powered X (Twitter) content strategy skill that distills methodologies from 6 top creators + open-source algorithm data into actionable writing, growth, and monetization guidance.

wx-favorites-report

22
from Aradotso/trending-skills

End-to-end pipeline to extract, decrypt, and visualize WeChat Mac favorites from encrypted SQLite DB into an interactive HTML report.

wterm-web-terminal

22
from Aradotso/trending-skills

Web terminal emulator with Zig/WASM core, DOM rendering, and React/vanilla JS bindings

worldmonitor-intelligence-dashboard

22
from Aradotso/trending-skills

Real-time global intelligence dashboard with AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking

witr-process-inspector

22
from Aradotso/trending-skills

CLI and TUI tool that explains why processes, services, and ports are running by tracing causality chains across supervisors, containers, and shells.

wildworld-dataset

22
from Aradotso/trending-skills

WildWorld large-scale action-conditioned world modeling dataset with 108M+ frames from a photorealistic ARPG game, featuring per-frame annotations, 450+ actions, and explicit state information for generative world modeling research.

whatcable-macos-usb-inspector

22
from Aradotso/trending-skills

macOS menu bar app that identifies USB-C cable capabilities and charging diagnostics using IOKit

wewrite-wechat-ai-publishing

22
from Aradotso/trending-skills

Full-pipeline AI skill for WeChat Official Account articles — hotspot fetching, topic selection, writing, SEO, image generation, formatting, and draft box publishing.