embedding-visualizer
Explore high-dimensional text embeddings visually. Use when you need to: upload texts and compute embeddings, reduce dimensions with UMAP or t-SNE, explore semantic clusters in 2D or 3D scatter plots, find nearest neighbors by text query, label clusters, or compare two datasets side-by-side. Triggers include "visualize embeddings", "explore semantic clusters", "find similar texts", "embedding visualization", "UMAP plot", or any task requiring interactive exploration of text embedding spaces.
Best use case
embedding-visualizer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Explore high-dimensional text embeddings visually. Use when you need to: upload texts and compute embeddings, reduce dimensions with UMAP or t-SNE, explore semantic clusters in 2D or 3D scatter plots, find nearest neighbors by text query, label clusters, or compare two datasets side-by-side. Triggers include "visualize embeddings", "explore semantic clusters", "find similar texts", "embedding visualization", "UMAP plot", or any task requiring interactive exploration of text embedding spaces.
Teams using embedding-visualizer should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/embedding-visualizer/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How embedding-visualizer Compares
| Feature / Agent | embedding-visualizer | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Explore high-dimensional text embeddings visually. Use when you need to: upload texts and compute embeddings, reduce dimensions with UMAP or t-SNE, explore semantic clusters in 2D or 3D scatter plots, find nearest neighbors by text query, label clusters, or compare two datasets side-by-side. Triggers include "visualize embeddings", "explore semantic clusters", "find similar texts", "embedding visualization", "UMAP plot", or any task requiring interactive exploration of text embedding spaces.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# embedding-visualizer
Self-hosted tool for exploring text embeddings visually. Upload texts, compute embeddings via an OpenAI-compatible API, reduce to 2D or 3D with UMAP or t-SNE, and explore semantic clusters interactively.
## When to use
- Exploring semantic structure in a text corpus
- Discovering topic clusters in documents
- Finding nearest neighbors by semantic similarity
- Comparing two corpora side-by-side
- Labeling semantic clusters for downstream analysis
- Debugging embedding quality visually
## Prerequisites
- Node.js 20+
- pnpm
- `EMBED_ENCRYPTION_KEY` set (32-byte hex)
- An OpenAI-compatible embedding API key
## Quick Start
```bash
# Install dependencies
pnpm install
# Copy and edit environment
cp .env.example .env
# Set EMBED_ENCRYPTION_KEY in .env
# Start the server
pnpm start
# In another terminal, start the dashboard
pnpm --filter dashboard dev
```
Server: http://localhost:4100
Dashboard: http://localhost:4101
## Uploading Texts
### File upload (curl)
```bash
curl -X POST http://localhost:4100/api/datasets \
-H 'Content-Type: application/json' \
-d '{"name": "My Dataset", "description": "Optional description"}'
# returns { "id": "ds_abc123" }
curl -X POST http://localhost:4100/api/datasets/ds_abc123/upload \
-F 'file=@reviews.txt'
```
Accepted file formats:
- `.txt` - one text per line
- `.csv` - first column used as text
- `.json` - array of strings
### Paste texts (JSON body)
```bash
curl -X POST http://localhost:4100/api/datasets/ds_abc123/embed \
-H 'Content-Type: application/json' \
-d '{"texts": ["First text", "Second text", "Third text"]}'
```
## Running a Projection
```bash
# Start UMAP 2D projection
curl -X POST http://localhost:4100/api/datasets/ds_abc123/project \
-H 'Content-Type: application/json' \
-d '{"method": "umap", "dims": 2, "params": {"n_neighbors": 15, "min_dist": 0.1}}'
# returns { "projection_id": "proj_xyz789" }
# Poll status
curl http://localhost:4100/api/datasets/ds_abc123/projection
# returns { "status": "running" } or { "status": "completed" }
```
## Nearest Neighbor Search
```bash
# Search by text query
curl -X POST http://localhost:4100/api/nearest \
-H 'Content-Type: application/json' \
-d '{"dataset_id": "ds_abc123", "text": "excellent product quality", "k": 10}'
# Search by existing point ID
curl -X POST http://localhost:4100/api/nearest \
-H 'Content-Type: application/json' \
-d '{"dataset_id": "ds_abc123", "point_id": "pt_8f3c2a1", "k": 10}'
```
Response:
```json
[
{ "id": "pt_...", "content": "...", "label": "Positive", "similarity": 0.94 },
...
]
```
## Label Management
```bash
# List labels in a dataset
curl http://localhost:4100/api/datasets/ds_abc123/labels
# Assign a label to points
curl -X POST http://localhost:4100/api/datasets/ds_abc123/labels \
-H 'Content-Type: application/json' \
-d '{"label": "Positive", "point_ids": ["pt_8f3c2a1", "pt_3d7e9b2"]}'
# Delete a label (unassigns from all points)
curl -X DELETE http://localhost:4100/api/datasets/ds_abc123/labels/Positive
```
## Export
```bash
# Export as JSON
curl "http://localhost:4100/api/datasets/ds_abc123/export?format=json" \
-o dataset.json
# Export as CSV
curl "http://localhost:4100/api/datasets/ds_abc123/export?format=csv" \
-o dataset.csv
# Export only labeled points
curl "http://localhost:4100/api/datasets/ds_abc123/export?format=json&label=Positive" \
-o positive.json
```
## API Reference
| Endpoint | Method | Description |
|---|---|---|
| `/api/datasets` | GET | List all datasets |
| `/api/datasets` | POST | Create dataset |
| `/api/datasets/:id` | GET | Get dataset details |
| `/api/datasets/:id` | DELETE | Delete dataset |
| `/api/datasets/:id/upload` | POST | Upload text file |
| `/api/datasets/:id/embed` | POST | Embed pasted texts |
| `/api/datasets/:id/points` | GET | All points with coords |
| `/api/datasets/:id/project` | POST | Start projection |
| `/api/datasets/:id/projection` | GET | Projection status |
| `/api/datasets/:id/labels` | GET | List labels |
| `/api/datasets/:id/labels` | POST | Assign label |
| `/api/datasets/:id/labels/:label` | DELETE | Delete label |
| `/api/nearest` | POST | Nearest neighbor search |
| `/api/settings` | GET | Get settings |
| `/api/settings` | PATCH | Update settings |
| `/health` | GET | Health check |
## Environment Variables
| Variable | Description | Default |
|---|---|---|
| `EMBED_PORT` | Server port | 4100 |
| `EMBED_DASHBOARD_PORT` | Dashboard dev port | 4101 |
| `EMBED_DATA_DIR` | SQLite and uploads directory | ~/.embedding-visualizer |
| `EMBED_ENCRYPTION_KEY` | 32-byte hex for API key encryption | required |
| `EMBED_MODEL` | Default embedding model | text-embedding-3-small |
| `EMBED_BASE_URL` | Embedding API base URL | https://api.openai.com/v1 |
| `EMBED_BATCH_SIZE` | Texts per API call | 100 |
| `EMBED_MAX_FILE_SIZE` | Max upload size in bytes | 10485760 |
| `EMBED_LOG_LEVEL` | debug, info, warn, error | info |
| `EMBED_DEV` | Dev mode | 0 |
## Troubleshooting
### Embedding fails with 401
The API key is not set or has expired. Go to Settings in the dashboard and update the key.
### Projection is slow
UMAP and t-SNE run server-side in Node.js. For 1,000+ points expect 10-60 seconds. Reduce n_neighbors or n_iter to speed up.
### File upload rejected
Check the file is a supported format (.txt, .csv, .json) and under 10 MB. Files with a non-text MIME type are rejected.
### Rate limiting (429 errors)
Reduce `EMBED_BATCH_SIZE` to 20-50 and retry. The client applies exponential backoff automatically.Related Skills
Skill: Uptime Monitoring
## Overview
Skill: Status Page
## Overview
Skill: unit-conversion
## Overview
Skill: recipe-scaler
## Overview
reading-list
Operate the reading-list API to save, manage, tag, search, and export articles.
email-digest
Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.
websocket-realtime
Use the WebSocket connection in poll-builder to receive live vote updates. Use when you need to stream real-time poll results, monitor a poll for new votes, or build a live dashboard. Triggers include "live results", "real-time updates", "stream votes", "watch poll", or "WebSocket".
poll-builder
Self-hosted poll creation tool with real-time results. Use when you need to create a poll, check vote counts, close a poll, export results, or get the shareable link for a poll. Triggers include "create poll", "vote", "poll results", "survey", "collect votes", "share poll", or any task involving polling or voting.
Skill: personal-finance
## Overview
Skill: csv-import
## Overview
Skill: Syntax Highlighting
## Purpose
Skill: Pastebin Core
## Purpose