duckdb

DuckDB analytical database for OLAP workloads. Use for embedded analytics.

7 stars

byG1Joshi

View on GitHub Installation ↓

Best use case

duckdb is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

DuckDB analytical database for OLAP workloads. Use for embedded analytics.

Teams using duckdb should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/duckdb/SKILL.md --create-dirs "https://raw.githubusercontent.com/G1Joshi/Agent-Skills/main/skills/databases/duckdb/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/duckdb/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How duckdb Compares

Feature / Agent	duckdb	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

DuckDB analytical database for OLAP workloads. Use for embedded analytics.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# DuckDB

DuckDB is "SQLite for Analytics". It is an in-process SQL OLAP database. It runs inside your application process and is blazing fast for analytical queries on local files (Parquet, CSV, JSON).

## When to Use

- **Local Analytics**: Analyze millions of rows on your laptop in seconds.
- **Data Engineering**: Process data in Python/R pipelines (replacement for Pandas).
- **Serverless Data Lake**: Query S3 parquet files directly via Lambda without a running warehouse.

## Quick Start (Python)

```python
import duckdb

# Query local CSV directly
duckdb.sql("SELECT avg(price) FROM 'sales.csv' WHERE region='US'").show()

# Connect to S3
duckdb.sql("INSTALL httpfs; LOAD httpfs;")
duckdb.sql("SELECT count(*) FROM 's3://my-bucket/data.parquet'")
```

## Core Concepts

### Vectorized Execution

Standard DBs process row-by-row. DuckDB processes batches of columns (Vectors), utilizing modern CPU SIMD instructions.

### Universal Format Reader

Can query CSV, JSON, Parquet, Arrow, SQLite, and Postgres tables as if they were local tables.

### Zero Dependencies

Single binary/library.

## Best Practices (2025)

**Do**:

- **Use Parquet**: It is the native language of analytics. DuckDB + Parquet is incredible.
- **Replace Pandas**: For datasets larger than RAM, DuckDB works (Disk spilling) where Pandas crashes.
- **Use explicitly typed SQL**: DuckDB’s SQL dialect is very friendly and standard (Postgres-compatible).

**Don't**:

- **Don't use for Multi-User OLTP**: It handles concurrency poorly (single writer). Use Postgres for that. Use DuckDB for analysis.

## References

- [DuckDB Documentation](https://duckdb.org/docs/)