croissant_expert

Specialized in the MLCommons Croissant metadata specification. Can generate, validate, and serialize dataset metadata into compliant JSON-LD.

7 stars

Best use case

croissant_expert is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Specialized in the MLCommons Croissant metadata specification. Can generate, validate, and serialize dataset metadata into compliant JSON-LD.

Teams using croissant_expert should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/croissant_expert/SKILL.md --create-dirs "https://raw.githubusercontent.com/codata/croissant-toolkit/main/.gemini/skills/croissant_expert/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/croissant_expert/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How croissant_expert Compares

Feature / Agentcroissant_expertStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Specialized in the MLCommons Croissant metadata specification. Can generate, validate, and serialize dataset metadata into compliant JSON-LD.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Croissant Expert Skill

The Croissant Expert skill provides the core logic for working with the [MLCommons Croissant](https://mlcommons.org/croissant) specification. It is responsible for taking dataset descriptions and turning them into 100% compliant JSON-LD metadata files.

Croissant files are stored locally in `./data/croissant/` as JSON-LD files.

## Tools

### 1. Serialize to Croissant JSON-LD
Transforms a structured metadata JSON into the final Croissant format.

**Usage:**
```bash
# Standard serialization
python3 croissant_expert/scripts/serialize.py <INPUT_METADATA_JSON> [OUTPUT_JSON_LD]

# Serialization with Intelligent NLP enrichment
# Automatically detects creators, locations, and dates from the description.
python3 croissant_expert/scripts/serialize.py <INPUT_METADATA_JSON> --nlp
```

**Metadata Schema:**
The input JSON should follow this structure:
- `name`: String
- `description`: String
- `url`: String
- `license`: String
- `distribution`: List of `FileObject` or `FileSet`
- `recordSet`: List of `RecordSet` with `fields` and `source` information.

## Capabilities
- **Spec Interpretation**: Access to the latest MLCommons Croissant standard.
- **JSON-LD Generation**: Deep understanding of `@context`, `@type`, and linked data principles.
- **Validation-Ready**: Output files are designed to pass the official Croissant validator.

Related Skills

orchestrator_expert

7
from codata/croissant-toolkit

Orchestrator agent that has comprehensive knowledge and command over all available skills in this toolkit to create complex workflows.

neo4j_expert

7
from codata/croissant-toolkit

Store and query Croissant datasets in a Neo4j Graph Database for relational discovery and semantic search.

telegram_expert

7
from codata/croissant-toolkit

Send results and notifications to Telegram channels or users.

ro-crate-expert

7
from codata/croissant-toolkit

Specialized in creating RO-Crate packages from Dataverse metadata, with integrated ODRL-based DID (Decentralized Identifier) attribution and provenance via the ro-crate-py library.

📊 Presentation Expert Skill

7
from codata/croissant-toolkit

The **Presentation Expert** is responsible for transforming complex research data, metadata, and insights into high-impact presentation decks.

obsidian_expert

7
from codata/croissant-toolkit

Convert Croissant datasets into structured Obsidian Markdown notes with frontmatter and semantic tags.

nlp_expert

7
from codata/croissant-toolkit

Extract named entities (persons, organizations, dates, locations) from text and provide them in structured JSON-LD format.

walker

7
from codata/croissant-toolkit

Deep crawl functionality that extracts and visits internal links from a webpage.

youtuber

7
from codata/croissant-toolkit

Search for videos on YouTube based on specific keywords. Get list of videos with title, description, and URL.

wizard

7
from codata/croissant-toolkit

The ultimate data integrator. Orchestrates transcription, translation, NLP analysis, and Croissant serialization into a single automated pipeline.

unf

7
from codata/croissant-toolkit

Universal Numeric Fingerprint (UNF) generator. For strings, it splits into words and sorts them alphabetically to provide order-invariant fingerprints. Supports dataframes and files too.

translator

7
from codata/croissant-toolkit

Recognize the language of input content or video scripts and translate them precisely into English using Gemini 3.