add-datalake-consumer

Adds an event consumer that writes to Azure Data Lake (Parquet) following BI_SALES_RISK plan. Creates events/consumers/[Name]DataLakeCollector.ts subscribing to RabbitMQ, building Parquet rows, writing to /path_prefix/year=YYYY/month=MM/day=DD/. Use when adding DataLakeCollector in logging or similar “event to Data Lake” pipelines.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

add-datalake-consumer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using add-datalake-consumer should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/add-datalake-consumer/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/data-ai/add-datalake-consumer/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/add-datalake-consumer/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How add-datalake-consumer Compares

Feature / Agent	add-datalake-consumer	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Add Data Lake Consumer

Event consumer that subscribes to RabbitMQ and writes to **Azure Data Lake** (Parquet). Pattern: logging’s DataLakeCollector for `risk.evaluated` (BI_SALES_RISK_IMPLEMENTATION_PLAN §3.5, §9.1). **BI Sales Risk:** Paths and Parquet columns MUST match `documentation/requirements/BI_SALES_RISK_DATA_LAKE_LAYOUT.md` (§2.1 risk.evaluated, §2.2 ml_outcomes, §4 config).

## 1. Consumer

**Path:** `src/events/consumers/[Name]DataLakeCollector.ts`

- `EventConsumer` with `queue`, `exchange: coder_events`, `bindings`: e.g. `['risk.evaluated','ml.prediction.completed','opportunity.updated','forecast.generated']`.
- Handler: map event to row. For risk.evaluated use columns in Data Lake Layout §2.1. Build path: `{path_prefix}/year={YYYY}/month={MM}/day={DD}/...` (Layout §1).
- Write via `@azure/storage-blob` (BlockBlob) or `@azure/storage-blob` + `parquetjs` (or Arrow) for Parquet. Buffer/batch by time or count if needed.
- Config: `data_lake.connection_string`, `data_lake.container`, `data_lake.path_prefix` (e.g. `/risk_evaluations`).

## 2. Config

**config/default.yaml:**
```yaml
data_lake:
  connection_string: ${DATA_LAKE_CONNECTION_STRING}
  container: ${DATA_LAKE_CONTAINER:-risk}
  path_prefix: ${DATA_LAKE_PATH_PREFIX:-/risk_evaluations}

rabbitmq:
  url: ${RABBITMQ_URL}
  exchange: coder_events
  queue: [module]_data_lake
  bindings:
    - risk.evaluated
    - ml.prediction.completed
    # ...
```

**config/schema.json:** add `data_lake` with `connection_string`, `container`, `path_prefix`.

## 3. Server

In `server.ts`: `await dataLakeCollector.start()` after RabbitMQ connect.

## 4. Checklist

- [ ] Consumer in `events/consumers/`, subscribe to RabbitMQ (no Azure Service Bus)
- [ ] Path: `{path_prefix}/year=.../month=.../day=.../`; format Parquet
- [ ] Config: `data_lake.*` and schema; `rabbitmq` queue and bindings
- [ ] Start collector in server

Related Skills

azure-storage-file-datalake-py

from diegosouzapw/awesome-omni-skill

Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations.

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

moai-lang-r

from diegosouzapw/awesome-omni-skill

R 4.4+ best practices with testthat 3.2, lintr 3.2, and data analysis patterns.

moai-lang-python

from diegosouzapw/awesome-omni-skill

Python 3.13+ development specialist covering FastAPI, Django, async patterns, data science, testing with pytest, and modern Python features. Use when developing Python APIs, web applications, data pipelines, or writing tests.

moai-icons-vector

from diegosouzapw/awesome-omni-skill

Vector icon libraries ecosystem guide covering 10+ major libraries with 200K+ icons, including React Icons (35K+), Lucide (1000+), Tabler Icons (5900+), Iconify (200K+), Heroicons, Phosphor, and Radix Icons with implementation patterns, decision trees, and best practices.

moai-foundation-trust

from diegosouzapw/awesome-omni-skill

Complete TRUST 4 principles guide covering Test First, Readable, Unified, Secured. Validation methods, enterprise quality gates, metrics, and November 2025 standards. Enterprise v4.0 with 50+ software quality standards references.

moai-foundation-memory

from diegosouzapw/awesome-omni-skill

Persistent memory across sessions using MCP Memory Server for user preferences, project context, and learned patterns

moai-foundation-core

from diegosouzapw/awesome-omni-skill

MoAI-ADK's foundational principles - TRUST 5, SPEC-First TDD, delegation patterns, token optimization, progressive disclosure, modular architecture, agent catalog, command reference, and execution rules for building AI-powered development workflows

moai-cc-claude-md

from diegosouzapw/awesome-omni-skill

Authoring CLAUDE.md Project Instructions. Design project-specific AI guidance, document workflows, define architecture patterns. Use when creating CLAUDE.md files for projects, documenting team standards, or establishing AI collaboration guidelines.

moai-alfred-language-detection

from diegosouzapw/awesome-omni-skill

Auto-detects project language and framework from package.json, pyproject.toml, etc.

mnemonic

from diegosouzapw/awesome-omni-skill

Unified memory system - aggregates communications and AI sessions across all channels into searchable, analyzable memory

mlops

from diegosouzapw/awesome-omni-skill

MLflow, model versioning, experiment tracking, model registry, and production ML systems