MotherDuck — Serverless DuckDB in the Cloud

## Overview

25 stars

Best use case

MotherDuck — Serverless DuckDB in the Cloud is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

## Overview

Teams using MotherDuck — Serverless DuckDB in the Cloud should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/motherduck/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/TerminalSkills/skills/motherduck/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/motherduck/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How MotherDuck — Serverless DuckDB in the Cloud Compares

Feature / Agent	MotherDuck — Serverless DuckDB in the Cloud	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

## Overview

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# MotherDuck — Serverless DuckDB in the Cloud


## Overview


MotherDuck, the serverless analytics platform built on DuckDB that combines local and cloud query execution. Helps developers run SQL analytics on cloud-hosted data, share datasets, and build hybrid local-cloud data pipelines using DuckDB's familiar interface.


## Instructions

### Connection and Setup

```python
# Connect to MotherDuck from Python
import duckdb

# Authenticate (opens browser for first-time auth)
conn = duckdb.connect("md:")

# Or with explicit token
conn = duckdb.connect("md:?motherduck_token=your_token_here")

# Create a cloud database
conn.sql("CREATE DATABASE IF NOT EXISTS analytics")
conn.sql("USE analytics")

# Cloud databases persist — accessible from any machine
```

### Hybrid Queries (Local + Cloud)

```python
# Query that combines local files with cloud tables
import duckdb

conn = duckdb.connect("md:analytics")

# Upload a local Parquet file to MotherDuck
conn.sql("""
  CREATE OR REPLACE TABLE events AS
  SELECT * FROM read_parquet('local_data/events_2026_q1.parquet')
""")

# Query cloud data alongside local files
result = conn.sql("""
  SELECT
    e.event_type,
    COUNT(*) AS event_count,
    AVG(e.duration_ms) AS avg_duration,
    u.plan AS user_plan
  FROM analytics.events e                              -- Cloud table
  JOIN read_parquet('local_data/users.parquet') u      -- Local file
    ON e.user_id = u.id
  WHERE e.created_at >= '2026-01-01'
  GROUP BY 1, 4
  ORDER BY 2 DESC
""").df()

print(result)
```

### Data Loading

```python
# Load data from various sources into MotherDuck
conn = duckdb.connect("md:analytics")

# From CSV files (local or URL)
conn.sql("""
  CREATE OR REPLACE TABLE sales AS
  SELECT * FROM read_csv('https://data.example.com/sales_2026.csv',
    header=true, auto_detect=true)
""")

# From Parquet files on S3
conn.sql("""
  CREATE OR REPLACE TABLE events AS
  SELECT * FROM read_parquet('s3://my-bucket/events/*.parquet',
    hive_partitioning=true)
""")

# From JSON files
conn.sql("""
  CREATE OR REPLACE TABLE logs AS
  SELECT * FROM read_json_auto('logs/*.json')
""")

# Incremental loading — append new data
conn.sql("""
  INSERT INTO events
  SELECT * FROM read_parquet('s3://my-bucket/events/2026-03/*.parquet')
  WHERE event_id NOT IN (SELECT event_id FROM events)
""")
```

### Sharing Databases

```python
# Share a database with team members
conn = duckdb.connect("md:")

# Create a shared database
conn.sql("CREATE DATABASE team_analytics")
conn.sql("USE team_analytics")

# Load data
conn.sql("""
  CREATE TABLE monthly_metrics AS
  SELECT
    date_trunc('month', created_at) AS month,
    COUNT(DISTINCT user_id) AS mau,
    SUM(revenue) AS total_revenue,
    COUNT(*) AS total_events
  FROM analytics.events
  GROUP BY 1
  ORDER BY 1
""")

# Share with specific users (via MotherDuck UI or SQL)
conn.sql("GRANT SELECT ON DATABASE team_analytics TO 'colleague@company.com'")

# Recipients connect and query immediately:
# conn = duckdb.connect("md:team_analytics")
# conn.sql("SELECT * FROM monthly_metrics").df()
```

### Node.js / TypeScript Integration

```typescript
// src/analytics/motherduck.ts — MotherDuck in a Node.js application
import duckdb from "duckdb-async";

const db = await duckdb.Database.create("md:analytics", {
  motherduck_token: process.env.MOTHERDUCK_TOKEN!,
});

// Run analytics queries from your API
async function getMonthlyRevenue(months: number = 12) {
  const result = await db.all(`
    SELECT
      date_trunc('month', created_at) AS month,
      SUM(amount) AS revenue,
      COUNT(DISTINCT user_id) AS paying_users
    FROM orders
    WHERE created_at >= current_date - interval '${months} months'
      AND status = 'completed'
    GROUP BY 1
    ORDER BY 1
  `);
  return result;
}

// Use in an API endpoint
app.get("/api/analytics/revenue", async (req, res) => {
  const data = await getMonthlyRevenue(parseInt(req.query.months) || 12);
  res.json(data);
});
```

### Scheduled Transformations

```python
# scripts/daily_transform.py — Run as a cron job to update materialized tables
import duckdb

conn = duckdb.connect("md:analytics")

# Daily aggregation — rebuild summary tables
conn.sql("""
  CREATE OR REPLACE TABLE daily_kpis AS
  SELECT
    DATE(created_at) AS date,
    COUNT(DISTINCT user_id) AS dau,
    COUNT(DISTINCT CASE WHEN event_type = 'signup' THEN user_id END) AS signups,
    COUNT(DISTINCT CASE WHEN event_type = 'purchase' THEN user_id END) AS purchasers,
    SUM(CASE WHEN event_type = 'purchase' THEN amount ELSE 0 END) AS revenue,
    COUNT(*) AS total_events
  FROM events
  WHERE created_at >= current_date - interval '90 days'
  GROUP BY 1
  ORDER BY 1
""")

# Cohort analysis table
conn.sql("""
  CREATE OR REPLACE TABLE cohort_retention AS
  WITH first_seen AS (
    SELECT user_id, MIN(DATE(created_at)) AS cohort_date
    FROM events
    GROUP BY 1
  )
  SELECT
    f.cohort_date,
    DATE_DIFF('week', f.cohort_date, DATE(e.created_at)) AS week_number,
    COUNT(DISTINCT e.user_id) AS active_users
  FROM events e
  JOIN first_seen f ON e.user_id = f.user_id
  WHERE f.cohort_date >= current_date - interval '12 weeks'
  GROUP BY 1, 2
""")

print("✅ Daily transformations complete")
```

## Installation

```bash
# Python
pip install duckdb

# Node.js
npm install duckdb-async

# CLI (DuckDB CLI with MotherDuck support)
# Download from https://motherduck.com/docs/getting-started/
```


## Examples


### Example 1: Setting up an evaluation pipeline for a RAG application

**User request:**

```
I have a RAG chatbot that answers questions from our docs. Set up Motherduck to evaluate answer quality.
```

The agent creates an evaluation suite with appropriate metrics (faithfulness, relevance, answer correctness), configures test datasets from real user questions, runs baseline evaluations, and sets up CI integration so evaluations run on every prompt or retrieval change.

### Example 2: Comparing model performance across prompts

**User request:**

```
We're testing GPT-4o vs Claude on our customer support prompts. Set up a comparison with Motherduck.
```

The agent creates a structured experiment with the existing prompt set, configures both model providers, defines scoring criteria specific to customer support (accuracy, tone, completeness), runs the comparison, and generates a summary report with statistical significance indicators.


## Guidelines

1. **Hybrid execution** — MotherDuck pushes queries to the cloud OR runs locally depending on where data lives; leverage both
2. **Parquet for everything** — Store raw data as Parquet files on S3; query them directly without loading into tables
3. **Materialized tables for dashboards** — Pre-compute daily/weekly aggregates; don't recalculate on every dashboard load
4. **Share databases, not exports** — Instead of emailing CSVs, share a MotherDuck database; recipients get live, queryable data
5. **Use DuckDB locally for development** — Write and test queries locally with DuckDB; deploy the same SQL to MotherDuck
6. **Token management** — Store `MOTHERDUCK_TOKEN` in environment variables; never commit tokens to code
7. **Incremental loads** — For large datasets, append new data with `INSERT INTO ... WHERE NOT IN`; avoid full reloads
8. **Cost awareness** — MotherDuck charges by query compute; pre-aggregate expensive queries into summary tables

Related Skills

optimizing-cloud-costs

from ComeOnOliver/skillshub

Execute use when you need to work with cloud cost optimization. This skill provides cost analysis and optimization with comprehensive guidance and automation. Trigger with phrases like "optimize costs", "analyze spending", or "reduce costs".

Google Cloud Agent SDK Master

from ComeOnOliver/skillshub

Execute automatic activation for all google cloud agent development kit (adk) Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.

cloudwatch-alarm-creator

from ComeOnOliver/skillshub

Cloudwatch Alarm Creator - Auto-activating skill for AWS Skills. Triggers on: cloudwatch alarm creator, cloudwatch alarm creator Part of the AWS Skills skill category.

cloudfront-distribution-setup

from ComeOnOliver/skillshub

Cloudfront Distribution Setup - Auto-activating skill for AWS Skills. Triggers on: cloudfront distribution setup, cloudfront distribution setup Part of the AWS Skills skill category.

cloudformation-template-creator

from ComeOnOliver/skillshub

Cloudformation Template Creator - Auto-activating skill for AWS Skills. Triggers on: cloudformation template creator, cloudformation template creator Part of the AWS Skills skill category.

cloud-tasks-queue-setup

from ComeOnOliver/skillshub

Cloud Tasks Queue Setup - Auto-activating skill for GCP Skills. Triggers on: cloud tasks queue setup, cloud tasks queue setup Part of the GCP Skills skill category.

cloud-sql-instance-setup

from ComeOnOliver/skillshub

Cloud Sql Instance Setup - Auto-activating skill for GCP Skills. Triggers on: cloud sql instance setup, cloud sql instance setup Part of the GCP Skills skill category.

cloud-security-posture

from ComeOnOliver/skillshub

Cloud Security Posture - Auto-activating skill for Security Advanced. Triggers on: cloud security posture, cloud security posture Part of the Security Advanced skill category.

cloud-scheduler-job-creator

from ComeOnOliver/skillshub

Cloud Scheduler Job Creator - Auto-activating skill for GCP Skills. Triggers on: cloud scheduler job creator, cloud scheduler job creator Part of the GCP Skills skill category.

cloud-run-service-config

from ComeOnOliver/skillshub

Cloud Run Service Config - Auto-activating skill for GCP Skills. Triggers on: cloud run service config, cloud run service config Part of the GCP Skills skill category.

cloud-monitoring-alert

from ComeOnOliver/skillshub

Cloud Monitoring Alert - Auto-activating skill for GCP Skills. Triggers on: cloud monitoring alert, cloud monitoring alert Part of the GCP Skills skill category.

cloud-logging-sink-setup

from ComeOnOliver/skillshub

Cloud Logging Sink Setup - Auto-activating skill for GCP Skills. Triggers on: cloud logging sink setup, cloud logging sink setup Part of the GCP Skills skill category.