hugging-face-dataset-viewer

Query Hugging Face datasets through the Dataset Viewer API for splits, rows, search, filters, and parquet links.

31,392 stars
Complexity: easy

About this skill

This skill empowers AI agents to perform read-only exploration and extraction from Hugging Face datasets using the Dataset Viewer API. It provides a structured way to access dataset metadata (like configurations and splits), preview initial rows, paginate through content, and execute text-based searches within datasets. Agents can also retrieve direct links to parquet files for more extensive data processing. This integration is vital for tasks requiring on-demand data inspection or validation without needing to download entire datasets. It extends an agent's ability to interact with external data services, making it a powerful tool for data scientists, researchers, and developers leveraging Hugging Face's vast collection of datasets.

Best use case

Exploring unknown or newly encountered Hugging Face datasets to understand their content and structure. Validating the availability and configurations of a specific dataset before deeper analysis. Previewing initial dataset content to quickly grasp its schema, data types, and typical entries. Retrieving specific data rows for quick insights, examples, or to debug data-related issues. Searching for particular keywords or patterns within dataset content to find relevant information. Obtaining direct links to parquet files for programmatic download or integration into external data pipelines.

Query Hugging Face datasets through the Dataset Viewer API for splits, rows, search, filters, and parquet links.

The AI agent will successfully retrieve requested dataset information, such as available splits, configurations, initial rows, paginated content, search results matching specific queries, or direct parquet file links. This enables the agent to provide accurate, on-demand data insights to the user without needing to manually browse the Hugging Face website or download full datasets.

Practical example

Example input

Can you show me the first 5 rows of the 'squad' dataset, specifically for the 'plain_text' configuration and 'validation' split? Also, list all available splits for this dataset.

Example output

```json
{
  "message": "Successfully retrieved splits and first 5 rows for the 'squad' dataset.",
  "splits": [
    {"config": "plain_text", "split": "train"},
    {"config": "plain_text", "split": "validation"}
  ],
  "first_rows": [
    {
      "id": "56be4db0acb8001400a502ee",
      "title": "University_of_Michigan",
      "context": "The University of Michigan is a public research university in Ann Arbor, Michigan.",
      "question": "In what city is the University of Michigan?",
      "answers": {"text": ["Ann Arbor"], "answer_start": [57]}
    },
    {
      "id": "56be4db0acb8001400a502ef",
      "title": "University_of_Michigan",
      "context": "It is the state's oldest university and the flagship institution of the University of Michigan system.",
      "question": "What is the flagship institution of the University of Michigan system?",
      "answers": {"text": ["University of Michigan"], "answer_start": [80]}
    }
    // ... (up to 5 rows)
  ]
}
```

When to use this skill

  • Use this skill when your AI agent needs to perform read-only exploration of a Hugging Face dataset, retrieve specific data points, validate dataset structure, or search for content directly through the Dataset Viewer API. It is ideal for agents requiring quick access and inspection of public datasets without full download or complex setup.

When not to use this skill

  • This skill is not suitable for modifying Hugging Face datasets, uploading new data, training machine learning models directly (though it can provide the data), or performing complex, custom data transformations that require more than simple filtering or pagination. It is strictly for read-only data access and exploration.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/hugging-face-dataset-viewer/SKILL.md --create-dirs "https://raw.githubusercontent.com/sickn33/antigravity-awesome-skills/main/plugins/antigravity-awesome-skills-claude/skills/hugging-face-dataset-viewer/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/hugging-face-dataset-viewer/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How hugging-face-dataset-viewer Compares

Feature / Agenthugging-face-dataset-viewerStandard Approach
Platform SupportClaudeLimited / Varies
Context Awareness High Baseline
Installation ComplexityeasyN/A

Frequently Asked Questions

What does this skill do?

Query Hugging Face datasets through the Dataset Viewer API for splits, rows, search, filters, and parquet links.

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Hugging Face Dataset Viewer

## When to Use

Use this skill when you need read-only exploration of a Hugging Face dataset through the Dataset Viewer API.

Use this skill to execute read-only Dataset Viewer API calls for dataset exploration and extraction.

## Core workflow

1. Optionally validate dataset availability with `/is-valid`.
2. Resolve `config` + `split` with `/splits`.
3. Preview with `/first-rows`.
4. Paginate content with `/rows` using `offset` and `length` (max 100).
5. Use `/search` for text matching and `/filter` for row predicates.
6. Retrieve parquet links via `/parquet` and totals/metadata via `/size` and `/statistics`.

## Defaults

- Base URL: `https://datasets-server.huggingface.co`
- Default API method: `GET`
- Query params should be URL-encoded.
- `offset` is 0-based.
- `length` max is usually `100` for row-like endpoints.
- Gated/private datasets require `Authorization: Bearer <HF_TOKEN>`.

## Dataset Viewer

- `Validate dataset`: `/is-valid?dataset=<namespace/repo>`
- `List subsets and splits`: `/splits?dataset=<namespace/repo>`
- `Preview first rows`: `/first-rows?dataset=<namespace/repo>&config=<config>&split=<split>`
- `Paginate rows`: `/rows?dataset=<namespace/repo>&config=<config>&split=<split>&offset=<int>&length=<int>`
- `Search text`: `/search?dataset=<namespace/repo>&config=<config>&split=<split>&query=<text>&offset=<int>&length=<int>`
- `Filter with predicates`: `/filter?dataset=<namespace/repo>&config=<config>&split=<split>&where=<predicate>&orderby=<sort>&offset=<int>&length=<int>`
- `List parquet shards`: `/parquet?dataset=<namespace/repo>`
- `Get size totals`: `/size?dataset=<namespace/repo>`
- `Get column statistics`: `/statistics?dataset=<namespace/repo>&config=<config>&split=<split>`
- `Get Croissant metadata (if available)`: `/croissant?dataset=<namespace/repo>`

Pagination pattern:

```bash
curl "https://datasets-server.huggingface.co/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=0&length=100"
curl "https://datasets-server.huggingface.co/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=100&length=100"
```

When pagination is partial, use response fields such as `num_rows_total`, `num_rows_per_page`, and `partial` to drive continuation logic.

Search/filter notes:

- `/search` matches string columns (full-text style behavior is internal to the API).
- `/filter` requires predicate syntax in `where` and optional sort in `orderby`.
- Keep filtering and searches read-only and side-effect free.

## Querying Datasets

Use `npx parquetlens` with Hub parquet alias paths for SQL querying.

Parquet alias shape:

```text
hf://datasets/<namespace>/<repo>@~parquet/<config>/<split>/<shard>.parquet
```

Derive `<config>`, `<split>`, and `<shard>` from Dataset Viewer `/parquet`:

```bash
curl -s "https://datasets-server.huggingface.co/parquet?dataset=cfahlgren1/hub-stats" \
  | jq -r '.parquet_files[] | "hf://datasets/\(.dataset)@~parquet/\(.config)/\(.split)/\(.filename)"'
```

Run SQL query:

```bash
npx -y -p parquetlens -p @parquetlens/sql parquetlens \
  "hf://datasets/<namespace>/<repo>@~parquet/<config>/<split>/<shard>.parquet" \
  --sql "SELECT * FROM data LIMIT 20"
```

### SQL export

- CSV: `--sql "COPY (SELECT * FROM data LIMIT 1000) TO 'export.csv' (FORMAT CSV, HEADER, DELIMITER ',')"`
- JSON: `--sql "COPY (SELECT * FROM data LIMIT 1000) TO 'export.json' (FORMAT JSON)"`
- Parquet: `--sql "COPY (SELECT * FROM data LIMIT 1000) TO 'export.parquet' (FORMAT PARQUET)"`

## Creating and Uploading Datasets

Use one of these flows depending on dependency constraints.

Zero local dependencies (Hub UI):

- Create dataset repo in browser: `https://huggingface.co/new-dataset`
- Upload parquet files in the repo "Files and versions" page.
- Verify shards appear in Dataset Viewer:

```bash
curl -s "https://datasets-server.huggingface.co/parquet?dataset=<namespace>/<repo>"
```

Low dependency CLI flow (`npx @huggingface/hub` / `hfjs`):

- Set auth token:

```bash
export HF_TOKEN=<your_hf_token>
```

- Upload parquet folder to a dataset repo (auto-creates repo if missing):

```bash
npx -y @huggingface/hub upload datasets/<namespace>/<repo> ./local/parquet-folder data
```

- Upload as private repo on creation:

```bash
npx -y @huggingface/hub upload datasets/<namespace>/<repo> ./local/parquet-folder data --private
```

After upload, call `/parquet` to discover `<config>/<split>/<shard>` values for querying with `@~parquet`.

Related Skills

hugging-face-vision-trainer

31392
from sickn33/antigravity-awesome-skills

Train or fine-tune vision models on Hugging Face Jobs for detection, classification, and SAM or SAM2 segmentation.

Computer VisionClaude

hugging-face-trackio

31392
from sickn33/antigravity-awesome-skills

Track ML experiments with Trackio using Python logging, alerts, and CLI metric retrieval.

Machine LearningClaude

hugging-face-tool-builder

31392
from sickn33/antigravity-awesome-skills

Your purpose is now is to create reusable command line scripts and utilities for using the Hugging Face API, allowing chaining, piping and intermediate processing where helpful. You can access the API directly, as well as use the hf command line tool.

Developer ToolsClaude

hugging-face-papers

31392
from sickn33/antigravity-awesome-skills

Read and analyze Hugging Face paper pages or arXiv papers with markdown and papers API metadata.

Text AnalysisClaude

hugging-face-paper-publisher

31392
from sickn33/antigravity-awesome-skills

Publish and manage research papers on Hugging Face Hub. Supports creating paper pages, linking papers to models/datasets, claiming authorship, and generating professional markdown-based research articles.

AI Research PublishingClaude

hugging-face-model-trainer

31392
from sickn33/antigravity-awesome-skills

Train or fine-tune TRL language models on Hugging Face Jobs, including SFT, DPO, GRPO, and GGUF export.

AI Development & Self-ImprovementClaude

hugging-face-jobs

31392
from sickn33/antigravity-awesome-skills

Run workloads on Hugging Face Jobs with managed CPUs, GPUs, TPUs, secrets, and Hub persistence.

Machine LearningClaude

hugging-face-evaluation

31392
from sickn33/antigravity-awesome-skills

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.

Model ManagementClaude

hugging-face-datasets

31392
from sickn33/antigravity-awesome-skills

Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation. Designed to work alongside HF MCP server for comprehensive dataset workflows.

Data ManagementClaude

hugging-face-community-evals

31392
from sickn33/antigravity-awesome-skills

Run local evaluations for Hugging Face Hub models with inspect-ai or lighteval.

Model Evaluation & MLOpsClaude

hugging-face-cli

31392
from sickn33/antigravity-awesome-skills

Use the Hugging Face Hub CLI (`hf`) to download, upload, and manage models, datasets, and Spaces.

Machine LearningClaude

code-reviewer

31392
from sickn33/antigravity-awesome-skills

Elite code review expert specializing in modern AI-powered code

Developer ToolsClaude