unf
Universal Numeric Fingerprint (UNF) generator. For strings, it splits into words and sorts them alphabetically to provide order-invariant fingerprints. Supports dataframes and files too.
Best use case
unf is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Universal Numeric Fingerprint (UNF) generator. For strings, it splits into words and sorts them alphabetically to provide order-invariant fingerprints. Supports dataframes and files too.
Teams using unf should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/unf/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How unf Compares
| Feature / Agent | unf | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Universal Numeric Fingerprint (UNF) generator. For strings, it splits into words and sorts them alphabetically to provide order-invariant fingerprints. Supports dataframes and files too.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# ♾️ UNF Skill The **UNF (Universal Numeric Fingerprint)** skill provides a robust mechanism for generating consistent data identifiers. Unlike traditional file hashes (like MD5 or SHA-256), a UNF is **format-agnostic**—meaning the same data values will produce the same hash regardless of whether they are stored in CSV, Parquet, SAS, or Stata. ## 🌟 Key Features 1. **Semantic Hashing**: For strings, it splits content into words and sorts them alphabetically. This ensures that "temperature is celcius" and "celcius is temperature" produce the same UNF. 2. **Vector Hashing**: Fingerprint entire data columns (Polars Series). 3. **Format Invariance**: Identical data in different file formats (e.g., CSV vs. Parquet) yields the same UNF. 4. **Column-Order Invariance**: Dataframes with reordered columns produce the same hash. 5. **Dataverse Alignment**: Designed for parity with the canonical Dataverse UNF implementation. ## 🛠️ Components - `unf_hash.py`: CLI tool to compute a UNF for a string or file. - `dartfx-unf`: High-performance Python implementation using the Polars engine. ## 🚀 Usage ### Hash a simple string ```bash python3 .gemini/skills/unf/scripts/unf_hash.py "Data for fingerprinting" ``` ### Hash a data file (CSV, Parquet, etc.) ```bash python3 .gemini/skills/unf/scripts/unf_hash.py data/dataset.csv ``` ### Get a detailed JSON report ```bash python3 .gemini/skills/unf/scripts/unf_hash.py --json data/dataset.parquet ``` ## 📐 Specification - **Version**: UNF v6 - **Reference**: [Dataverse UNF v6 Guide](https://guides.dataverse.org/en/latest/developers/unf/unf-v6.html) - **Engine**: [dartfx-unf](https://github.com/DataArtifex/dartfx-unf)
Related Skills
walker
Deep crawl functionality that extracts and visits internal links from a webpage.
orchestrator_expert
Orchestrator agent that has comprehensive knowledge and command over all available skills in this toolkit to create complex workflows.
neo4j_expert
Store and query Croissant datasets in a Neo4j Graph Database for relational discovery and semantic search.
youtuber
Search for videos on YouTube based on specific keywords. Get list of videos with title, description, and URL.
wizard
The ultimate data integrator. Orchestrates transcription, translation, NLP analysis, and Croissant serialization into a single automated pipeline.
translator
Recognize the language of input content or video scripts and translate them precisely into English using Gemini 3.
transcriber
Fetch and store transcripts from YouTube videos for deep content analysis.
telegram_expert
Send results and notifications to Telegram channels or users.
rohub
Deposit research objects and add semantic annotations to the RO-Hub portal using the rohub library.
ro-crate-expert
Specialized in creating RO-Crate packages from Dataverse metadata, with integrated ODRL-based DID (Decentralized Identifier) attribution and provenance via the ro-crate-py library.
📊 Presentation Expert Skill
The **Presentation Expert** is responsible for transforming complex research data, metadata, and insights into high-impact presentation decks.
photograph
Captures visual snapshots (screenshots) of web pages and records screen sessions (video).