topology-data-analysis

Topological data analysis: persistent homology, Mapper, and TDA tools

191 stars

Best use case

topology-data-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Topological data analysis: persistent homology, Mapper, and TDA tools

Teams using topology-data-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/topology-data-analysis/SKILL.md --create-dirs "https://raw.githubusercontent.com/wentorai/research-plugins/main/skills/domains/math/topology-data-analysis/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/topology-data-analysis/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How topology-data-analysis Compares

Feature / Agent	topology-data-analysis	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Topological data analysis: persistent homology, Mapper, and TDA tools

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for ChatGPT

Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

AI Agent for SaaS Idea Validation

Use AI agent skills for SaaS idea validation, market research, customer discovery, competitor analysis, and documenting startup hypotheses.

SKILL.md Source

# Topological Data Analysis

A skill for applying topological data analysis (TDA) methods to research data. Covers persistent homology, Vietoris-Rips complexes, persistence diagrams, the Mapper algorithm, and vectorization methods for integrating topological features into machine learning pipelines.

## Core Concepts

### Simplicial Complexes from Data

TDA extracts topological features (connected components, loops, voids) from data by building simplicial complexes at multiple scales:

| Complex | Construction | Computational Cost |
|---------|-------------|-------------------|
| Vietoris-Rips | Edge if distance < epsilon | O(n^d) for d-simplices |
| Cech | Ball intersection (exact) | Computationally expensive |
| Alpha | Delaunay-based (exact in low dim) | Efficient in R^2, R^3 |
| Cubical | Grid-based (for images) | Linear in pixels |

### Filtration and Persistence

```
Scale epsilon:  0.1    0.3    0.5    0.7    1.0
                |------|------|------|------|------|
Components:      10      6      3      2      1
  (H0 features born at 0, die at merging scale)

Loops:           0      0      1      2      0
  (H1 features born when loop forms, die when filled)
```

A feature that persists across many scales is a genuine topological signal; short-lived features are noise.

## Persistent Homology with Ripser

### Computing Persistence Diagrams

```python
import numpy as np
from ripser import ripser
from persim import plot_diagrams

def compute_persistence(point_cloud: np.ndarray,
                         max_dim: int = 2,
                         max_edge: float = 2.0) -> dict:
    """
    Compute persistent homology of a point cloud.
    point_cloud: (n_points, n_dimensions) array
    max_dim: maximum homology dimension to compute
    max_edge: maximum edge length in Rips complex
    Returns persistence diagrams for each dimension.
    """
    result = ripser(
        point_cloud,
        maxdim=max_dim,
        thresh=max_edge,
    )

    diagrams = result["dgms"]
    summary = {}

    for dim, dgm in enumerate(diagrams):
        # Filter out infinite death times for H0
        finite = dgm[dgm[:, 1] < np.inf] if len(dgm) > 0 else dgm
        lifetimes = finite[:, 1] - finite[:, 0] if len(finite) > 0 else np.array([])

        summary[f"H{dim}"] = {
            "n_features": len(finite),
            "max_persistence": float(lifetimes.max()) if len(lifetimes) > 0 else 0,
            "mean_persistence": float(lifetimes.mean()) if len(lifetimes) > 0 else 0,
            "birth_death_pairs": finite.tolist(),
        }

    return summary

# Example: torus point cloud
def sample_torus(n=1000, R=3.0, r=1.0, noise=0.1):
    """Sample points from a torus in R^3."""
    theta = np.random.uniform(0, 2 * np.pi, n)
    phi = np.random.uniform(0, 2 * np.pi, n)
    x = (R + r * np.cos(phi)) * np.cos(theta) + np.random.normal(0, noise, n)
    y = (R + r * np.cos(phi)) * np.sin(theta) + np.random.normal(0, noise, n)
    z = r * np.sin(phi) + np.random.normal(0, noise, n)
    return np.column_stack([x, y, z])

torus = sample_torus(500)
persistence = compute_persistence(torus, max_dim=2)
# Expected: H0 has 1 long-lived component,
#           H1 has 2 prominent loops (the two fundamental cycles),
#           H2 has 1 prominent void (the cavity)
```

## Persistence Vectorization

### Converting Persistence to Feature Vectors

To use topological features in machine learning, persistence diagrams must be vectorized:

```python
from sklearn.base import BaseEstimator, TransformerMixin

class PersistenceStatistics(BaseEstimator, TransformerMixin):
    """
    Extract statistical features from persistence diagrams.
    Produces a fixed-length feature vector from variable-length diagrams.
    """

    def __init__(self, max_dim: int = 1):
        self.max_dim = max_dim

    def fit(self, X, y=None):
        return self

    def transform(self, diagrams_list: list) -> np.ndarray:
        features = []
        for diagrams in diagrams_list:
            row = []
            for dim in range(self.max_dim + 1):
                dgm = diagrams[dim]
                lifetimes = dgm[:, 1] - dgm[:, 0]
                lifetimes = lifetimes[np.isfinite(lifetimes)]

                if len(lifetimes) == 0:
                    row.extend([0, 0, 0, 0, 0, 0])
                else:
                    row.extend([
                        len(lifetimes),              # count
                        np.sum(lifetimes),            # total persistence
                        np.max(lifetimes),            # max persistence
                        np.mean(lifetimes),           # mean persistence
                        np.std(lifetimes),            # std persistence
                        np.sum(lifetimes ** 2),       # persistence entropy proxy
                    ])
            features.append(row)
        return np.array(features)
```

### Persistence Images

```python
def persistence_image(diagram: np.ndarray, resolution: int = 20,
                       sigma: float = 0.1,
                       weight_fn=None) -> np.ndarray:
    """
    Compute a persistence image from a persistence diagram.
    Transforms birth-death pairs into a stable, fixed-size representation.
    """
    if weight_fn is None:
        weight_fn = lambda birth, persistence: persistence

    # Transform to birth-persistence coordinates
    births = diagram[:, 0]
    persistences = diagram[:, 1] - diagram[:, 0]

    # Create grid
    x_range = np.linspace(births.min() - sigma, births.max() + sigma, resolution)
    y_range = np.linspace(0, persistences.max() + sigma, resolution)
    xx, yy = np.meshgrid(x_range, y_range)

    image = np.zeros((resolution, resolution))

    for b, p in zip(births, persistences):
        if not np.isfinite(p):
            continue
        w = weight_fn(b, p)
        gaussian = w * np.exp(-((xx - b)**2 + (yy - p)**2) / (2 * sigma**2))
        image += gaussian

    return image
```

## The Mapper Algorithm

### Constructing Mapper Graphs

Mapper provides a compressed topological summary of high-dimensional data:

```python
import kmapper as km
from sklearn.cluster import DBSCAN

def run_mapper(data: np.ndarray, lens_fn=None, n_cubes: int = 10,
               overlap: float = 0.3) -> dict:
    """
    Run the Mapper algorithm to produce a simplicial complex
    summarizing the shape of the data.
    data: (n_samples, n_features) array
    lens_fn: filter function (default: first two PCA components)
    """
    mapper = km.KeplerMapper(verbose=0)

    # Compute lens (filter function)
    if lens_fn is None:
        from sklearn.decomposition import PCA
        lens = mapper.fit_transform(data, projection=PCA(n_components=2))
    else:
        lens = lens_fn(data)

    # Build the Mapper graph
    graph = mapper.map(
        lens, data,
        cover=km.Cover(n_cubes=n_cubes, perc_overlap=overlap),
        clusterer=DBSCAN(eps=0.5, min_samples=3),
    )

    # Summary statistics
    n_nodes = len(graph["nodes"])
    n_edges = sum(len(v) for v in graph["links"].values()) // 2

    return {
        "n_nodes": n_nodes,
        "n_edges": n_edges,
        "node_sizes": [len(v) for v in graph["nodes"].values()],
        "graph": graph,
    }
```

### Mapper Parameters

| Parameter | Effect | Guidance |
|-----------|--------|---------|
| Filter function | Projects data to low dimensions | PCA, eccentricity, density |
| Number of intervals | Controls resolution of cover | 10-30 typical |
| Overlap percentage | Controls connectivity | 20-50%, higher = more edges |
| Clustering algorithm | Groups points within intervals | DBSCAN, single-linkage |

## Stability and Statistical Significance

### Bottleneck and Wasserstein Distances

```python
from persim import bottleneck, wasserstein

def compare_persistence_diagrams(dgm1: np.ndarray,
                                   dgm2: np.ndarray) -> dict:
    """
    Compare two persistence diagrams using standard TDA distances.
    """
    bn_dist = bottleneck(dgm1, dgm2)
    ws_dist = wasserstein(dgm1, dgm2, order=2)

    return {
        "bottleneck_distance": round(bn_dist, 6),
        "wasserstein_2_distance": round(ws_dist, 6),
    }
```

### Permutation Test for Topological Features

```python
def permutation_test_persistence(data1: np.ndarray, data2: np.ndarray,
                                   n_permutations: int = 1000,
                                   dim: int = 1) -> dict:
    """
    Test whether two point clouds have significantly different
    topological features using a permutation test on Wasserstein distance.
    """
    from persim import wasserstein

    # Observed distance
    dgm1 = ripser(data1, maxdim=dim)["dgms"][dim]
    dgm2 = ripser(data2, maxdim=dim)["dgms"][dim]
    observed = wasserstein(dgm1, dgm2)

    # Permutation distribution
    combined = np.vstack([data1, data2])
    n1 = len(data1)
    perm_distances = []

    for _ in range(n_permutations):
        perm = np.random.permutation(len(combined))
        perm_d1 = combined[perm[:n1]]
        perm_d2 = combined[perm[n1:]]
        perm_dgm1 = ripser(perm_d1, maxdim=dim)["dgms"][dim]
        perm_dgm2 = ripser(perm_d2, maxdim=dim)["dgms"][dim]
        perm_distances.append(wasserstein(perm_dgm1, perm_dgm2))

    p_value = np.mean(np.array(perm_distances) >= observed)

    return {
        "observed_distance": round(observed, 6),
        "p_value": round(p_value, 4),
        "significant_at_005": p_value < 0.05,
    }
```

## Tools and Libraries

- **Ripser / ripser.py**: Fast Vietoris-Rips persistence computation
- **GUDHI**: Comprehensive TDA library (C++ with Python bindings)
- **persim**: Persistence diagram distances and visualization
- **KeplerMapper**: Python Mapper algorithm implementation
- **giotto-tda**: TDA integrated with scikit-learn API
- **Dionysus 2**: Persistent homology and cohomology
- **scikit-tda**: Meta-package bundling ripser, persim, kepler-mapper, tadasets

Related Skills

json-data-visualizer

191

from wentorai/research-plugins

Guide to JSON Crack for visualizing complex JSON data structures

datagen-research-guide

191

from wentorai/research-plugins

AI-driven multi-agent research assistant for end-to-end studies

data-collection-automation

191

from wentorai/research-plugins

Automate survey deployment, data collection, and pipeline management

database-comparison-guide

191

from wentorai/research-plugins

Compare major academic databases and when to use each for research

wikidata-api-guide

191

from wentorai/research-plugins

Query Wikidata SPARQL for scholarly metadata, authors, and entities

datacite-api

191

from wentorai/research-plugins

Resolve dataset DOIs and query research data metadata via DataCite

crossref-event-data-api

191

from wentorai/research-plugins

Track scholarly mentions across the web via Crossref Event Data

metadata-skills

191

from wentorai/research-plugins

24 metadata & bibliometrics skills. Trigger: DOI resolution, citation metrics, author disambiguation, bibliometrics. Design: metadata APIs and bibliometric analysis tools for scholarly records.

dataverse-api

191

from wentorai/research-plugins

Deposit and discover research datasets via Harvard Dataverse API

network-analysis-guide

191

from wentorai/research-plugins

Social network analysis methods, metrics, and visualization tools

ipums-microdata-api

191

from wentorai/research-plugins

Access harmonized census and survey microdata via the IPUMS API

astrophysics-data-guide

191

from wentorai/research-plugins

Astronomical data processing with Astropy, FITS files, and sky surveys