umap-projection

Run UMAP or t-SNE dimensionality reduction on embedding datasets. Use when you need to project high-dimensional vectors to 2D or 3D, tune projection parameters, compare UMAP vs t-SNE results, or poll projection status. Triggers include "project embeddings", "run UMAP", "t-SNE projection", "reduce dimensions", "projection parameters", or any need to configure or trigger dimensionality reduction in embedding-visualizer.

7 stars

Best use case

umap-projection is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Run UMAP or t-SNE dimensionality reduction on embedding datasets. Use when you need to project high-dimensional vectors to 2D or 3D, tune projection parameters, compare UMAP vs t-SNE results, or poll projection status. Triggers include "project embeddings", "run UMAP", "t-SNE projection", "reduce dimensions", "projection parameters", or any need to configure or trigger dimensionality reduction in embedding-visualizer.

Teams using umap-projection should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/umap-projection/SKILL.md --create-dirs "https://raw.githubusercontent.com/heldernoid/agentic-build-templates/main/projects/ai-llm-tools/embedding-visualizer/skills/umap-projection/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/umap-projection/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How umap-projection Compares

Feature / Agentumap-projectionStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Run UMAP or t-SNE dimensionality reduction on embedding datasets. Use when you need to project high-dimensional vectors to 2D or 3D, tune projection parameters, compare UMAP vs t-SNE results, or poll projection status. Triggers include "project embeddings", "run UMAP", "t-SNE projection", "reduce dimensions", "projection parameters", or any need to configure or trigger dimensionality reduction in embedding-visualizer.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# umap-projection

Configure and run UMAP or t-SNE dimensionality reduction projections in embedding-visualizer.

## How Projection Works

1. The server reads all raw embedding vectors for the dataset from SQLite.
2. The selected algorithm (UMAP or t-SNE) runs server-side in a Node.js async worker.
3. Resulting 2D or 3D coordinates are stored back into the `points` table.
4. Status is polled via `GET /api/datasets/:id/projection`.

Projections do not replace the raw vectors. You can re-project with different parameters at any time.

## Starting a Projection

```bash
curl -X POST http://localhost:4100/api/datasets/ds_abc123/project \
  -H 'Content-Type: application/json' \
  -d '{
    "method": "umap",
    "dims": 2,
    "params": {
      "n_neighbors": 15,
      "min_dist": 0.1,
      "metric": "cosine"
    }
  }'
```

Response:
```json
{ "projection_id": "proj_xyz789", "status": "pending" }
```

## Polling Status

```bash
curl http://localhost:4100/api/datasets/ds_abc123/projection
```

Responses during lifecycle:
```json
{ "status": "pending", "projection_id": "proj_xyz789" }
{ "status": "running", "projection_id": "proj_xyz789", "started_at": "2024-12-18T14:22:01Z" }
{ "status": "completed", "projection_id": "proj_xyz789", "completed_at": "2024-12-18T14:22:44Z" }
{ "status": "failed", "error": "umap-js: insufficient data points" }
```

## UMAP Parameters

| Parameter | Type | Default | Description |
|---|---|---|---|
| `n_neighbors` | integer | 15 | Local neighborhood size. Higher = more global structure preserved. Range: 2-200. |
| `min_dist` | float | 0.1 | Minimum distance between points in low-dim space. Lower = tighter clusters. Range: 0-1. |
| `metric` | string | "cosine" | Distance metric. Use "cosine" for text embeddings. Options: cosine, euclidean, manhattan. |
| `n_components` | integer | 2 | Output dimensions. 2 or 3. |
| `random_state` | integer | 42 | Seed for reproducibility. |

### UMAP Parameter Guidelines

**Tight, well-separated clusters:**
```json
{ "n_neighbors": 10, "min_dist": 0.05 }
```

**Broad, spread-out layout showing global structure:**
```json
{ "n_neighbors": 50, "min_dist": 0.3 }
```

**Large dataset (1000+ points):**
```json
{ "n_neighbors": 30, "min_dist": 0.1 }
```

## t-SNE Parameters

| Parameter | Type | Default | Description |
|---|---|---|---|
| `perplexity` | integer | 30 | Balance between local and global aspects. Typical range: 5-50. |
| `learning_rate` | integer | 200 | Learning rate for optimization. Typical range: 10-1000. |
| `n_iter` | integer | 1000 | Number of optimization iterations. Higher = better but slower. |
| `n_components` | integer | 2 | Output dimensions. 2 or 3. |

### t-SNE Parameter Guidelines

**Small dataset (< 200 points):**
```json
{ "perplexity": 10, "n_iter": 1000 }
```

**Large dataset (500-2000 points):**
```json
{ "perplexity": 50, "n_iter": 1500 }
```

**Very tight clusters:**
```json
{ "perplexity": 5, "learning_rate": 100 }
```

## UMAP vs t-SNE Comparison

| Property | UMAP | t-SNE |
|---|---|---|
| Speed | Faster (seconds for 1000 pts) | Slower (10-60s for 1000 pts) |
| Global structure | Better preserved | Often distorted |
| Local structure | Good | Very good |
| Reproducibility | Yes (with random_state) | Yes (with random_state) |
| Best for | Most cases | Small datasets, very tight clusters |

For text embeddings, UMAP with cosine metric is usually the best starting point.

## Retrieve Projected Points

After projection completes, retrieve 2D coordinates:

```bash
curl "http://localhost:4100/api/datasets/ds_abc123/points"
```

Response (array):
```json
[
  {
    "id": "pt_8f3c2a1",
    "content": "Absolutely love this product...",
    "label": "Positive",
    "x2d": 0.4821,
    "y2d": -0.1243,
    "x3d": 0.48,
    "y3d": -0.12,
    "z3d": 0.31
  }
]
```

Fields `x2d`, `y2d` are populated after a 2D projection. `x3d`, `y3d`, `z3d` are populated after a 3D projection. Both can coexist.

## Common Issues

### "insufficient data points"

UMAP requires at least `n_neighbors + 1` points. If your dataset has fewer than 16 points with default settings, reduce `n_neighbors` to `dataset_size - 1`.

### Projection takes too long

Reduce `n_iter` (t-SNE) or `n_neighbors` (UMAP). For datasets over 5,000 points expect several minutes.

### Clusters look random / no structure visible

Try cosine metric (required for text embeddings). If already using cosine, the embeddings may not contain clear semantic clusters. Try a different dataset or model.

### Re-running projection overwrites previous coordinates

Yes, running a new projection updates `x2d`, `y2d` (or `x3d`, `y3d`, `z3d`) for all points in the dataset. The previous projection record is kept in the `projections` table for history, but the coordinates in `points` are replaced.

Related Skills

Skill: Uptime Monitoring

7
from heldernoid/agentic-build-templates

## Overview

Skill: Status Page

7
from heldernoid/agentic-build-templates

## Overview

Skill: unit-conversion

7
from heldernoid/agentic-build-templates

## Overview

Skill: recipe-scaler

7
from heldernoid/agentic-build-templates

## Overview

reading-list

7
from heldernoid/agentic-build-templates

Operate the reading-list API to save, manage, tag, search, and export articles.

email-digest

7
from heldernoid/agentic-build-templates

Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.

websocket-realtime

7
from heldernoid/agentic-build-templates

Use the WebSocket connection in poll-builder to receive live vote updates. Use when you need to stream real-time poll results, monitor a poll for new votes, or build a live dashboard. Triggers include "live results", "real-time updates", "stream votes", "watch poll", or "WebSocket".

poll-builder

7
from heldernoid/agentic-build-templates

Self-hosted poll creation tool with real-time results. Use when you need to create a poll, check vote counts, close a poll, export results, or get the shareable link for a poll. Triggers include "create poll", "vote", "poll results", "survey", "collect votes", "share poll", or any task involving polling or voting.

Skill: personal-finance

7
from heldernoid/agentic-build-templates

## Overview

Skill: csv-import

7
from heldernoid/agentic-build-templates

## Overview

Skill: Syntax Highlighting

7
from heldernoid/agentic-build-templates

## Purpose

Skill: Pastebin Core

7
from heldernoid/agentic-build-templates

## Purpose