unf

Universal Numeric Fingerprint (UNF) generator. For strings, it splits into words and sorts them alphabetically to provide order-invariant fingerprints. Supports dataframes and files too.

7 stars

Best use case

unf is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Universal Numeric Fingerprint (UNF) generator. For strings, it splits into words and sorts them alphabetically to provide order-invariant fingerprints. Supports dataframes and files too.

Teams using unf should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/unf/SKILL.md --create-dirs "https://raw.githubusercontent.com/codata/croissant-toolkit/main/.gemini/skills/unf/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/unf/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How unf Compares

Feature / AgentunfStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Universal Numeric Fingerprint (UNF) generator. For strings, it splits into words and sorts them alphabetically to provide order-invariant fingerprints. Supports dataframes and files too.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# ♾️ UNF Skill

The **UNF (Universal Numeric Fingerprint)** skill provides a robust mechanism for generating consistent data identifiers. Unlike traditional file hashes (like MD5 or SHA-256), a UNF is **format-agnostic**—meaning the same data values will produce the same hash regardless of whether they are stored in CSV, Parquet, SAS, or Stata.

## 🌟 Key Features

1.  **Semantic Hashing**: For strings, it splits content into words and sorts them alphabetically. This ensures that "temperature is celcius" and "celcius is temperature" produce the same UNF.
2.  **Vector Hashing**: Fingerprint entire data columns (Polars Series).
3.  **Format Invariance**: Identical data in different file formats (e.g., CSV vs. Parquet) yields the same UNF.
4.  **Column-Order Invariance**: Dataframes with reordered columns produce the same hash.
5.  **Dataverse Alignment**: Designed for parity with the canonical Dataverse UNF implementation.

## 🛠️ Components

- `unf_hash.py`: CLI tool to compute a UNF for a string or file.
- `dartfx-unf`: High-performance Python implementation using the Polars engine.

## 🚀 Usage

### Hash a simple string
```bash
python3 .gemini/skills/unf/scripts/unf_hash.py "Data for fingerprinting"
```

### Hash a data file (CSV, Parquet, etc.)
```bash
python3 .gemini/skills/unf/scripts/unf_hash.py data/dataset.csv
```

### Get a detailed JSON report
```bash
python3 .gemini/skills/unf/scripts/unf_hash.py --json data/dataset.parquet
```

## 📐 Specification
- **Version**: UNF v6
- **Reference**: [Dataverse UNF v6 Guide](https://guides.dataverse.org/en/latest/developers/unf/unf-v6.html)
- **Engine**: [dartfx-unf](https://github.com/DataArtifex/dartfx-unf)

Related Skills

walker

7
from codata/croissant-toolkit

Deep crawl functionality that extracts and visits internal links from a webpage.

orchestrator_expert

7
from codata/croissant-toolkit

Orchestrator agent that has comprehensive knowledge and command over all available skills in this toolkit to create complex workflows.

neo4j_expert

7
from codata/croissant-toolkit

Store and query Croissant datasets in a Neo4j Graph Database for relational discovery and semantic search.

youtuber

7
from codata/croissant-toolkit

Search for videos on YouTube based on specific keywords. Get list of videos with title, description, and URL.

wizard

7
from codata/croissant-toolkit

The ultimate data integrator. Orchestrates transcription, translation, NLP analysis, and Croissant serialization into a single automated pipeline.

translator

7
from codata/croissant-toolkit

Recognize the language of input content or video scripts and translate them precisely into English using Gemini 3.

transcriber

7
from codata/croissant-toolkit

Fetch and store transcripts from YouTube videos for deep content analysis.

telegram_expert

7
from codata/croissant-toolkit

Send results and notifications to Telegram channels or users.

rohub

7
from codata/croissant-toolkit

Deposit research objects and add semantic annotations to the RO-Hub portal using the rohub library.

ro-crate-expert

7
from codata/croissant-toolkit

Specialized in creating RO-Crate packages from Dataverse metadata, with integrated ODRL-based DID (Decentralized Identifier) attribution and provenance via the ro-crate-py library.

📊 Presentation Expert Skill

7
from codata/croissant-toolkit

The **Presentation Expert** is responsible for transforming complex research data, metadata, and insights into high-impact presentation decks.

photograph

7
from codata/croissant-toolkit

Captures visual snapshots (screenshots) of web pages and records screen sessions (video).