embedding-visualizer

Explore high-dimensional text embeddings visually. Use when you need to: upload texts and compute embeddings, reduce dimensions with UMAP or t-SNE, explore semantic clusters in 2D or 3D scatter plots, find nearest neighbors by text query, label clusters, or compare two datasets side-by-side. Triggers include "visualize embeddings", "explore semantic clusters", "find similar texts", "embedding visualization", "UMAP plot", or any task requiring interactive exploration of text embedding spaces.

7 stars

Best use case

embedding-visualizer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Explore high-dimensional text embeddings visually. Use when you need to: upload texts and compute embeddings, reduce dimensions with UMAP or t-SNE, explore semantic clusters in 2D or 3D scatter plots, find nearest neighbors by text query, label clusters, or compare two datasets side-by-side. Triggers include "visualize embeddings", "explore semantic clusters", "find similar texts", "embedding visualization", "UMAP plot", or any task requiring interactive exploration of text embedding spaces.

Teams using embedding-visualizer should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/embedding-visualizer/SKILL.md --create-dirs "https://raw.githubusercontent.com/heldernoid/agentic-build-templates/main/projects/ai-llm-tools/embedding-visualizer/skills/embedding-visualizer/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/embedding-visualizer/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How embedding-visualizer Compares

Feature / Agentembedding-visualizerStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Explore high-dimensional text embeddings visually. Use when you need to: upload texts and compute embeddings, reduce dimensions with UMAP or t-SNE, explore semantic clusters in 2D or 3D scatter plots, find nearest neighbors by text query, label clusters, or compare two datasets side-by-side. Triggers include "visualize embeddings", "explore semantic clusters", "find similar texts", "embedding visualization", "UMAP plot", or any task requiring interactive exploration of text embedding spaces.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# embedding-visualizer

Self-hosted tool for exploring text embeddings visually. Upload texts, compute embeddings via an OpenAI-compatible API, reduce to 2D or 3D with UMAP or t-SNE, and explore semantic clusters interactively.

## When to use

- Exploring semantic structure in a text corpus
- Discovering topic clusters in documents
- Finding nearest neighbors by semantic similarity
- Comparing two corpora side-by-side
- Labeling semantic clusters for downstream analysis
- Debugging embedding quality visually

## Prerequisites

- Node.js 20+
- pnpm
- `EMBED_ENCRYPTION_KEY` set (32-byte hex)
- An OpenAI-compatible embedding API key

## Quick Start

```bash
# Install dependencies
pnpm install

# Copy and edit environment
cp .env.example .env
# Set EMBED_ENCRYPTION_KEY in .env

# Start the server
pnpm start

# In another terminal, start the dashboard
pnpm --filter dashboard dev
```

Server: http://localhost:4100
Dashboard: http://localhost:4101

## Uploading Texts

### File upload (curl)

```bash
curl -X POST http://localhost:4100/api/datasets \
  -H 'Content-Type: application/json' \
  -d '{"name": "My Dataset", "description": "Optional description"}'
# returns { "id": "ds_abc123" }

curl -X POST http://localhost:4100/api/datasets/ds_abc123/upload \
  -F 'file=@reviews.txt'
```

Accepted file formats:
- `.txt` - one text per line
- `.csv` - first column used as text
- `.json` - array of strings

### Paste texts (JSON body)

```bash
curl -X POST http://localhost:4100/api/datasets/ds_abc123/embed \
  -H 'Content-Type: application/json' \
  -d '{"texts": ["First text", "Second text", "Third text"]}'
```

## Running a Projection

```bash
# Start UMAP 2D projection
curl -X POST http://localhost:4100/api/datasets/ds_abc123/project \
  -H 'Content-Type: application/json' \
  -d '{"method": "umap", "dims": 2, "params": {"n_neighbors": 15, "min_dist": 0.1}}'
# returns { "projection_id": "proj_xyz789" }

# Poll status
curl http://localhost:4100/api/datasets/ds_abc123/projection
# returns { "status": "running" } or { "status": "completed" }
```

## Nearest Neighbor Search

```bash
# Search by text query
curl -X POST http://localhost:4100/api/nearest \
  -H 'Content-Type: application/json' \
  -d '{"dataset_id": "ds_abc123", "text": "excellent product quality", "k": 10}'

# Search by existing point ID
curl -X POST http://localhost:4100/api/nearest \
  -H 'Content-Type: application/json' \
  -d '{"dataset_id": "ds_abc123", "point_id": "pt_8f3c2a1", "k": 10}'
```

Response:
```json
[
  { "id": "pt_...", "content": "...", "label": "Positive", "similarity": 0.94 },
  ...
]
```

## Label Management

```bash
# List labels in a dataset
curl http://localhost:4100/api/datasets/ds_abc123/labels

# Assign a label to points
curl -X POST http://localhost:4100/api/datasets/ds_abc123/labels \
  -H 'Content-Type: application/json' \
  -d '{"label": "Positive", "point_ids": ["pt_8f3c2a1", "pt_3d7e9b2"]}'

# Delete a label (unassigns from all points)
curl -X DELETE http://localhost:4100/api/datasets/ds_abc123/labels/Positive
```

## Export

```bash
# Export as JSON
curl "http://localhost:4100/api/datasets/ds_abc123/export?format=json" \
  -o dataset.json

# Export as CSV
curl "http://localhost:4100/api/datasets/ds_abc123/export?format=csv" \
  -o dataset.csv

# Export only labeled points
curl "http://localhost:4100/api/datasets/ds_abc123/export?format=json&label=Positive" \
  -o positive.json
```

## API Reference

| Endpoint | Method | Description |
|---|---|---|
| `/api/datasets` | GET | List all datasets |
| `/api/datasets` | POST | Create dataset |
| `/api/datasets/:id` | GET | Get dataset details |
| `/api/datasets/:id` | DELETE | Delete dataset |
| `/api/datasets/:id/upload` | POST | Upload text file |
| `/api/datasets/:id/embed` | POST | Embed pasted texts |
| `/api/datasets/:id/points` | GET | All points with coords |
| `/api/datasets/:id/project` | POST | Start projection |
| `/api/datasets/:id/projection` | GET | Projection status |
| `/api/datasets/:id/labels` | GET | List labels |
| `/api/datasets/:id/labels` | POST | Assign label |
| `/api/datasets/:id/labels/:label` | DELETE | Delete label |
| `/api/nearest` | POST | Nearest neighbor search |
| `/api/settings` | GET | Get settings |
| `/api/settings` | PATCH | Update settings |
| `/health` | GET | Health check |

## Environment Variables

| Variable | Description | Default |
|---|---|---|
| `EMBED_PORT` | Server port | 4100 |
| `EMBED_DASHBOARD_PORT` | Dashboard dev port | 4101 |
| `EMBED_DATA_DIR` | SQLite and uploads directory | ~/.embedding-visualizer |
| `EMBED_ENCRYPTION_KEY` | 32-byte hex for API key encryption | required |
| `EMBED_MODEL` | Default embedding model | text-embedding-3-small |
| `EMBED_BASE_URL` | Embedding API base URL | https://api.openai.com/v1 |
| `EMBED_BATCH_SIZE` | Texts per API call | 100 |
| `EMBED_MAX_FILE_SIZE` | Max upload size in bytes | 10485760 |
| `EMBED_LOG_LEVEL` | debug, info, warn, error | info |
| `EMBED_DEV` | Dev mode | 0 |

## Troubleshooting

### Embedding fails with 401

The API key is not set or has expired. Go to Settings in the dashboard and update the key.

### Projection is slow

UMAP and t-SNE run server-side in Node.js. For 1,000+ points expect 10-60 seconds. Reduce n_neighbors or n_iter to speed up.

### File upload rejected

Check the file is a supported format (.txt, .csv, .json) and under 10 MB. Files with a non-text MIME type are rejected.

### Rate limiting (429 errors)

Reduce `EMBED_BATCH_SIZE` to 20-50 and retry. The client applies exponential backoff automatically.

Related Skills

Skill: Uptime Monitoring

7
from heldernoid/agentic-build-templates

## Overview

Skill: Status Page

7
from heldernoid/agentic-build-templates

## Overview

Skill: unit-conversion

7
from heldernoid/agentic-build-templates

## Overview

Skill: recipe-scaler

7
from heldernoid/agentic-build-templates

## Overview

reading-list

7
from heldernoid/agentic-build-templates

Operate the reading-list API to save, manage, tag, search, and export articles.

email-digest

7
from heldernoid/agentic-build-templates

Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.

websocket-realtime

7
from heldernoid/agentic-build-templates

Use the WebSocket connection in poll-builder to receive live vote updates. Use when you need to stream real-time poll results, monitor a poll for new votes, or build a live dashboard. Triggers include "live results", "real-time updates", "stream votes", "watch poll", or "WebSocket".

poll-builder

7
from heldernoid/agentic-build-templates

Self-hosted poll creation tool with real-time results. Use when you need to create a poll, check vote counts, close a poll, export results, or get the shareable link for a poll. Triggers include "create poll", "vote", "poll results", "survey", "collect votes", "share poll", or any task involving polling or voting.

Skill: personal-finance

7
from heldernoid/agentic-build-templates

## Overview

Skill: csv-import

7
from heldernoid/agentic-build-templates

## Overview

Skill: Syntax Highlighting

7
from heldernoid/agentic-build-templates

## Purpose

Skill: Pastebin Core

7
from heldernoid/agentic-build-templates

## Purpose