wizard

The ultimate data integrator. Orchestrates transcription, translation, NLP analysis, and Croissant serialization into a single automated pipeline.

7 stars

Best use case

wizard is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

The ultimate data integrator. Orchestrates transcription, translation, NLP analysis, and Croissant serialization into a single automated pipeline.

Teams using wizard should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/wizard/SKILL.md --create-dirs "https://raw.githubusercontent.com/codata/croissant-toolkit/main/.gemini/skills/wizard/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/wizard/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How wizard Compares

Feature / AgentwizardStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

The ultimate data integrator. Orchestrates transcription, translation, NLP analysis, and Croissant serialization into a single automated pipeline.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Wizard Skill

The Wizard skill is the high-level conductor of the Croissant Toolkit. It automates the entire journey from a raw data source (like a YouTube video or a foreign-language document) to a fully compliant and enriched Croissant JSON-LD file.

## Workflow
1.  **Acquisition**: Fetches content via `Transcriber` (for videos) or reads local files.
2.  **Harmonization**: Passes content through the `Translator` to ensure it's in English and refined.
3.  **Intelligence**: Uses the `NLP Expert` (via the Croissant Expert integration) to automatically detect creators, locations, and dates.
4.  **Finalization**: Generates the standardized `Croissant JSON-LD` file in `./data/croissant/`.

## Tools

### 1. The Automated Pipeline
Runs the full integration flow on any provided content or URL.

**Usage:**
```bash
# Process a YouTube video into a Croissant file
python3 wizard/scripts/wizard.py "https://www.youtube.com/watch?v=VIDEO_ID" "My Dataset Name"

# Process a local file
python3 wizard/scripts/wizard.py ./data/my_notes.txt "Notes Dataset"

# Process raw text
python3 wizard/scripts/wizard.py "Long description of my dataset..." "My Dataset"
```

## Capabilities
- **Multi-Skill Orchestration**: Zero-config coordination between Transcriber, Translator, NLP, and Croissant skills.
- **Smart Detection**: Automatically handles YouTube URLs vs. local files vs. raw strings.
- **Auto-Enrichment**: Always applies NLP analysis to maximize metadata quality.

Related Skills

walker

7
from codata/croissant-toolkit

Deep crawl functionality that extracts and visits internal links from a webpage.

orchestrator_expert

7
from codata/croissant-toolkit

Orchestrator agent that has comprehensive knowledge and command over all available skills in this toolkit to create complex workflows.

neo4j_expert

7
from codata/croissant-toolkit

Store and query Croissant datasets in a Neo4j Graph Database for relational discovery and semantic search.

youtuber

7
from codata/croissant-toolkit

Search for videos on YouTube based on specific keywords. Get list of videos with title, description, and URL.

unf

7
from codata/croissant-toolkit

Universal Numeric Fingerprint (UNF) generator. For strings, it splits into words and sorts them alphabetically to provide order-invariant fingerprints. Supports dataframes and files too.

translator

7
from codata/croissant-toolkit

Recognize the language of input content or video scripts and translate them precisely into English using Gemini 3.

transcriber

7
from codata/croissant-toolkit

Fetch and store transcripts from YouTube videos for deep content analysis.

telegram_expert

7
from codata/croissant-toolkit

Send results and notifications to Telegram channels or users.

rohub

7
from codata/croissant-toolkit

Deposit research objects and add semantic annotations to the RO-Hub portal using the rohub library.

ro-crate-expert

7
from codata/croissant-toolkit

Specialized in creating RO-Crate packages from Dataverse metadata, with integrated ODRL-based DID (Decentralized Identifier) attribution and provenance via the ro-crate-py library.

📊 Presentation Expert Skill

7
from codata/croissant-toolkit

The **Presentation Expert** is responsible for transforming complex research data, metadata, and insights into high-impact presentation decks.

photograph

7
from codata/croissant-toolkit

Captures visual snapshots (screenshots) of web pages and records screen sessions (video).