Data Catalog Enricher

Enriches data catalog entries with automated metadata

509 stars

Best use case

Data Catalog Enricher is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Enriches data catalog entries with automated metadata

Teams using Data Catalog Enricher should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/data-catalog-enricher/SKILL.md --create-dirs "https://raw.githubusercontent.com/a5c-ai/babysitter/main/library/specializations/data-engineering-analytics/skills/data-catalog-enricher/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/data-catalog-enricher/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Data Catalog Enricher Compares

Feature / AgentData Catalog EnricherStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Enriches data catalog entries with automated metadata

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Data Catalog Enricher

## Overview

Enriches data catalog entries with automated metadata. This skill enhances data discoverability and governance through intelligent metadata augmentation.

## Capabilities

- Automated tag suggestion
- Business glossary term matching
- Owner/steward recommendation
- Usage pattern analysis
- Data classification (sensitivity, PII)
- Quality score integration
- Lineage enrichment
- Search optimization

## Input Schema

```json
{
  "catalogEntry": "object",
  "dataProfile": "object",
  "existingGlossary": "object",
  "organizationContext": "object"
}
```

## Output Schema

```json
{
  "enrichedEntry": "object",
  "suggestedTags": ["string"],
  "glossaryMatches": ["object"],
  "classificationResults": "object",
  "ownerSuggestions": ["string"]
}
```

## Target Processes

- Data Catalog
- Data Lineage Mapping
- Data Quality Framework

## Usage Guidelines

1. Provide existing catalog entry for enrichment
2. Include data profile for classification analysis
3. Supply business glossary for term matching
4. Add organization context for owner recommendations

## Best Practices

- Regularly update glossary matches as glossary evolves
- Validate PII classifications with data stewards
- Integrate quality scores from quality framework
- Maintain consistent tagging taxonomy
- Review and approve automated classifications

Related Skills

structured-data

509
from a5c-ai/babysitter

JSON-LD schema markup and validation.

CVE/CWE Database Skill

509
from a5c-ai/babysitter

CVE and CWE database querying and management

error-code-catalog

509
from a5c-ai/babysitter

Manage and document SDK error codes and messages

test-data-generation

509
from a5c-ai/babysitter

Synthetic test data generation and management using Faker.js and similar tools. Generate realistic test data, create data factories, implement database seeding, and manage test data anonymization.

iOS Persistence (Core Data/Realm)

509
from a5c-ai/babysitter

Specialized skill for iOS local data persistence solutions

Room Database

509
from a5c-ai/babysitter

Expert skill for Android Room persistence library

metadata-standards-implementation

509
from a5c-ai/babysitter

Apply Dublin Core, METS, MODS, and other metadata schemas for digital collections and archival materials

health-data-integration

509
from a5c-ai/babysitter

Facilitate interoperability between health IT systems including EHR, HIE, and clinical decision support through HL7, FHIR, and other healthcare data standards

data-versioning-manager

509
from a5c-ai/babysitter

Skill for managing data versions and provenance

data-encoder

509
from a5c-ai/babysitter

Classical data encoding skill for quantum machine learning applications

root-data-analyzer

509
from a5c-ai/babysitter

ROOT/CERN data analysis skill for high-energy physics data processing, histogramming, and statistical analysis

bluesky-data-collection

509
from a5c-ai/babysitter

Bluesky experimental orchestration skill for scan automation, data collection, and metadata management