umap-projection
Run UMAP or t-SNE dimensionality reduction on embedding datasets. Use when you need to project high-dimensional vectors to 2D or 3D, tune projection parameters, compare UMAP vs t-SNE results, or poll projection status. Triggers include "project embeddings", "run UMAP", "t-SNE projection", "reduce dimensions", "projection parameters", or any need to configure or trigger dimensionality reduction in embedding-visualizer.
Best use case
umap-projection is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Run UMAP or t-SNE dimensionality reduction on embedding datasets. Use when you need to project high-dimensional vectors to 2D or 3D, tune projection parameters, compare UMAP vs t-SNE results, or poll projection status. Triggers include "project embeddings", "run UMAP", "t-SNE projection", "reduce dimensions", "projection parameters", or any need to configure or trigger dimensionality reduction in embedding-visualizer.
Teams using umap-projection should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/umap-projection/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How umap-projection Compares
| Feature / Agent | umap-projection | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Run UMAP or t-SNE dimensionality reduction on embedding datasets. Use when you need to project high-dimensional vectors to 2D or 3D, tune projection parameters, compare UMAP vs t-SNE results, or poll projection status. Triggers include "project embeddings", "run UMAP", "t-SNE projection", "reduce dimensions", "projection parameters", or any need to configure or trigger dimensionality reduction in embedding-visualizer.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# umap-projection
Configure and run UMAP or t-SNE dimensionality reduction projections in embedding-visualizer.
## How Projection Works
1. The server reads all raw embedding vectors for the dataset from SQLite.
2. The selected algorithm (UMAP or t-SNE) runs server-side in a Node.js async worker.
3. Resulting 2D or 3D coordinates are stored back into the `points` table.
4. Status is polled via `GET /api/datasets/:id/projection`.
Projections do not replace the raw vectors. You can re-project with different parameters at any time.
## Starting a Projection
```bash
curl -X POST http://localhost:4100/api/datasets/ds_abc123/project \
-H 'Content-Type: application/json' \
-d '{
"method": "umap",
"dims": 2,
"params": {
"n_neighbors": 15,
"min_dist": 0.1,
"metric": "cosine"
}
}'
```
Response:
```json
{ "projection_id": "proj_xyz789", "status": "pending" }
```
## Polling Status
```bash
curl http://localhost:4100/api/datasets/ds_abc123/projection
```
Responses during lifecycle:
```json
{ "status": "pending", "projection_id": "proj_xyz789" }
{ "status": "running", "projection_id": "proj_xyz789", "started_at": "2024-12-18T14:22:01Z" }
{ "status": "completed", "projection_id": "proj_xyz789", "completed_at": "2024-12-18T14:22:44Z" }
{ "status": "failed", "error": "umap-js: insufficient data points" }
```
## UMAP Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| `n_neighbors` | integer | 15 | Local neighborhood size. Higher = more global structure preserved. Range: 2-200. |
| `min_dist` | float | 0.1 | Minimum distance between points in low-dim space. Lower = tighter clusters. Range: 0-1. |
| `metric` | string | "cosine" | Distance metric. Use "cosine" for text embeddings. Options: cosine, euclidean, manhattan. |
| `n_components` | integer | 2 | Output dimensions. 2 or 3. |
| `random_state` | integer | 42 | Seed for reproducibility. |
### UMAP Parameter Guidelines
**Tight, well-separated clusters:**
```json
{ "n_neighbors": 10, "min_dist": 0.05 }
```
**Broad, spread-out layout showing global structure:**
```json
{ "n_neighbors": 50, "min_dist": 0.3 }
```
**Large dataset (1000+ points):**
```json
{ "n_neighbors": 30, "min_dist": 0.1 }
```
## t-SNE Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| `perplexity` | integer | 30 | Balance between local and global aspects. Typical range: 5-50. |
| `learning_rate` | integer | 200 | Learning rate for optimization. Typical range: 10-1000. |
| `n_iter` | integer | 1000 | Number of optimization iterations. Higher = better but slower. |
| `n_components` | integer | 2 | Output dimensions. 2 or 3. |
### t-SNE Parameter Guidelines
**Small dataset (< 200 points):**
```json
{ "perplexity": 10, "n_iter": 1000 }
```
**Large dataset (500-2000 points):**
```json
{ "perplexity": 50, "n_iter": 1500 }
```
**Very tight clusters:**
```json
{ "perplexity": 5, "learning_rate": 100 }
```
## UMAP vs t-SNE Comparison
| Property | UMAP | t-SNE |
|---|---|---|
| Speed | Faster (seconds for 1000 pts) | Slower (10-60s for 1000 pts) |
| Global structure | Better preserved | Often distorted |
| Local structure | Good | Very good |
| Reproducibility | Yes (with random_state) | Yes (with random_state) |
| Best for | Most cases | Small datasets, very tight clusters |
For text embeddings, UMAP with cosine metric is usually the best starting point.
## Retrieve Projected Points
After projection completes, retrieve 2D coordinates:
```bash
curl "http://localhost:4100/api/datasets/ds_abc123/points"
```
Response (array):
```json
[
{
"id": "pt_8f3c2a1",
"content": "Absolutely love this product...",
"label": "Positive",
"x2d": 0.4821,
"y2d": -0.1243,
"x3d": 0.48,
"y3d": -0.12,
"z3d": 0.31
}
]
```
Fields `x2d`, `y2d` are populated after a 2D projection. `x3d`, `y3d`, `z3d` are populated after a 3D projection. Both can coexist.
## Common Issues
### "insufficient data points"
UMAP requires at least `n_neighbors + 1` points. If your dataset has fewer than 16 points with default settings, reduce `n_neighbors` to `dataset_size - 1`.
### Projection takes too long
Reduce `n_iter` (t-SNE) or `n_neighbors` (UMAP). For datasets over 5,000 points expect several minutes.
### Clusters look random / no structure visible
Try cosine metric (required for text embeddings). If already using cosine, the embeddings may not contain clear semantic clusters. Try a different dataset or model.
### Re-running projection overwrites previous coordinates
Yes, running a new projection updates `x2d`, `y2d` (or `x3d`, `y3d`, `z3d`) for all points in the dataset. The previous projection record is kept in the `projections` table for history, but the coordinates in `points` are replaced.Related Skills
Skill: Uptime Monitoring
## Overview
Skill: Status Page
## Overview
Skill: unit-conversion
## Overview
Skill: recipe-scaler
## Overview
reading-list
Operate the reading-list API to save, manage, tag, search, and export articles.
email-digest
Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.
websocket-realtime
Use the WebSocket connection in poll-builder to receive live vote updates. Use when you need to stream real-time poll results, monitor a poll for new votes, or build a live dashboard. Triggers include "live results", "real-time updates", "stream votes", "watch poll", or "WebSocket".
poll-builder
Self-hosted poll creation tool with real-time results. Use when you need to create a poll, check vote counts, close a poll, export results, or get the shareable link for a poll. Triggers include "create poll", "vote", "poll results", "survey", "collect votes", "share poll", or any task involving polling or voting.
Skill: personal-finance
## Overview
Skill: csv-import
## Overview
Skill: Syntax Highlighting
## Purpose
Skill: Pastebin Core
## Purpose