nlp_expert

Extract named entities (persons, organizations, dates, locations) from text and provide them in structured JSON-LD format.

7 stars

bycodata

View on GitHub Installation ↓

Best use case

nlp_expert is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Extract named entities (persons, organizations, dates, locations) from text and provide them in structured JSON-LD format.

Teams using nlp_expert should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/nlp_expert/SKILL.md --create-dirs "https://raw.githubusercontent.com/codata/croissant-toolkit/main/.gemini/skills/nlp_expert/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/nlp_expert/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How nlp_expert Compares

Feature / Agent	nlp_expert	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Extract named entities (persons, organizations, dates, locations) from text and provide them in structured JSON-LD format.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# NLP Expert Skill

The NLP Expert skill uses Gemini 3 to perform advanced Named Entity Recognition (NER). It identifies key entities within dataset descriptions, transcripts, or web results and maps them to standard Schema.org types in JSON-LD.

This is critical for establishing provenance, identifying dataset creators, and mapping geographic and temporal coverage in the Croissant specification.

## Tools

### 1. Extract Named Entities
Analyzes text or a file and returns all detected entities as JSON-LD. Results are stored in `./data/nlp/`.

**Usage:**
```bash
# Process raw text
python3 nlp_expert/scripts/extract_entities.py "Sergei Bodrov was born in Moscow in 1971."

# Process a file (e.g., a transcript)
python3 nlp_expert/scripts/extract_entities.py data/transcripts/6cWcZ2G53gE.txt
```

**Example Output (JSON-LD):**
```jsonld
{
  "@context": "https://schema.org/",
  "@type": "ItemList",
  "itemListElement": [
    {
      "@type": "Person",
      "name": "Sergei Bodrov"
    },
    {
      "@type": "Place",
      "name": "Moscow"
    }
  ]
}
```

Related Skills

orchestrator_expert

from codata/croissant-toolkit

Orchestrator agent that has comprehensive knowledge and command over all available skills in this toolkit to create complex workflows.

neo4j_expert

from codata/croissant-toolkit

Store and query Croissant datasets in a Neo4j Graph Database for relational discovery and semantic search.

telegram_expert

from codata/croissant-toolkit

Send results and notifications to Telegram channels or users.

ro-crate-expert

from codata/croissant-toolkit

Specialized in creating RO-Crate packages from Dataverse metadata, with integrated ODRL-based DID (Decentralized Identifier) attribution and provenance via the ro-crate-py library.

📊 Presentation Expert Skill

from codata/croissant-toolkit

The **Presentation Expert** is responsible for transforming complex research data, metadata, and insights into high-impact presentation decks.

obsidian_expert

from codata/croissant-toolkit

Convert Croissant datasets into structured Obsidian Markdown notes with frontmatter and semantic tags.

croissant_expert

from codata/croissant-toolkit

Specialized in the MLCommons Croissant metadata specification. Can generate, validate, and serialize dataset metadata into compliant JSON-LD.

walker

from codata/croissant-toolkit

Deep crawl functionality that extracts and visits internal links from a webpage.

youtuber

from codata/croissant-toolkit

Search for videos on YouTube based on specific keywords. Get list of videos with title, description, and URL.

wizard

from codata/croissant-toolkit

The ultimate data integrator. Orchestrates transcription, translation, NLP analysis, and Croissant serialization into a single automated pipeline.

unf

from codata/croissant-toolkit

Universal Numeric Fingerprint (UNF) generator. For strings, it splits into words and sorts them alphabetically to provide order-invariant fingerprints. Supports dataframes and files too.

translator

from codata/croissant-toolkit

Recognize the language of input content or video scripts and translate them precisely into English using Gemini 3.