topology-data-analysis
Topological data analysis: persistent homology, Mapper, and TDA tools
Best use case
topology-data-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Topological data analysis: persistent homology, Mapper, and TDA tools
Teams using topology-data-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/topology-data-analysis/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How topology-data-analysis Compares
| Feature / Agent | topology-data-analysis | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Topological data analysis: persistent homology, Mapper, and TDA tools
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
AI Agent for Product Research
Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.
AI Agent for SaaS Idea Validation
Use AI agent skills for SaaS idea validation, market research, customer discovery, competitor analysis, and documenting startup hypotheses.
SKILL.md Source
# Topological Data Analysis
A skill for applying topological data analysis (TDA) methods to research data. Covers persistent homology, Vietoris-Rips complexes, persistence diagrams, the Mapper algorithm, and vectorization methods for integrating topological features into machine learning pipelines.
## Core Concepts
### Simplicial Complexes from Data
TDA extracts topological features (connected components, loops, voids) from data by building simplicial complexes at multiple scales:
| Complex | Construction | Computational Cost |
|---------|-------------|-------------------|
| Vietoris-Rips | Edge if distance < epsilon | O(n^d) for d-simplices |
| Cech | Ball intersection (exact) | Computationally expensive |
| Alpha | Delaunay-based (exact in low dim) | Efficient in R^2, R^3 |
| Cubical | Grid-based (for images) | Linear in pixels |
### Filtration and Persistence
```
Scale epsilon: 0.1 0.3 0.5 0.7 1.0
|------|------|------|------|------|
Components: 10 6 3 2 1
(H0 features born at 0, die at merging scale)
Loops: 0 0 1 2 0
(H1 features born when loop forms, die when filled)
```
A feature that persists across many scales is a genuine topological signal; short-lived features are noise.
## Persistent Homology with Ripser
### Computing Persistence Diagrams
```python
import numpy as np
from ripser import ripser
from persim import plot_diagrams
def compute_persistence(point_cloud: np.ndarray,
max_dim: int = 2,
max_edge: float = 2.0) -> dict:
"""
Compute persistent homology of a point cloud.
point_cloud: (n_points, n_dimensions) array
max_dim: maximum homology dimension to compute
max_edge: maximum edge length in Rips complex
Returns persistence diagrams for each dimension.
"""
result = ripser(
point_cloud,
maxdim=max_dim,
thresh=max_edge,
)
diagrams = result["dgms"]
summary = {}
for dim, dgm in enumerate(diagrams):
# Filter out infinite death times for H0
finite = dgm[dgm[:, 1] < np.inf] if len(dgm) > 0 else dgm
lifetimes = finite[:, 1] - finite[:, 0] if len(finite) > 0 else np.array([])
summary[f"H{dim}"] = {
"n_features": len(finite),
"max_persistence": float(lifetimes.max()) if len(lifetimes) > 0 else 0,
"mean_persistence": float(lifetimes.mean()) if len(lifetimes) > 0 else 0,
"birth_death_pairs": finite.tolist(),
}
return summary
# Example: torus point cloud
def sample_torus(n=1000, R=3.0, r=1.0, noise=0.1):
"""Sample points from a torus in R^3."""
theta = np.random.uniform(0, 2 * np.pi, n)
phi = np.random.uniform(0, 2 * np.pi, n)
x = (R + r * np.cos(phi)) * np.cos(theta) + np.random.normal(0, noise, n)
y = (R + r * np.cos(phi)) * np.sin(theta) + np.random.normal(0, noise, n)
z = r * np.sin(phi) + np.random.normal(0, noise, n)
return np.column_stack([x, y, z])
torus = sample_torus(500)
persistence = compute_persistence(torus, max_dim=2)
# Expected: H0 has 1 long-lived component,
# H1 has 2 prominent loops (the two fundamental cycles),
# H2 has 1 prominent void (the cavity)
```
## Persistence Vectorization
### Converting Persistence to Feature Vectors
To use topological features in machine learning, persistence diagrams must be vectorized:
```python
from sklearn.base import BaseEstimator, TransformerMixin
class PersistenceStatistics(BaseEstimator, TransformerMixin):
"""
Extract statistical features from persistence diagrams.
Produces a fixed-length feature vector from variable-length diagrams.
"""
def __init__(self, max_dim: int = 1):
self.max_dim = max_dim
def fit(self, X, y=None):
return self
def transform(self, diagrams_list: list) -> np.ndarray:
features = []
for diagrams in diagrams_list:
row = []
for dim in range(self.max_dim + 1):
dgm = diagrams[dim]
lifetimes = dgm[:, 1] - dgm[:, 0]
lifetimes = lifetimes[np.isfinite(lifetimes)]
if len(lifetimes) == 0:
row.extend([0, 0, 0, 0, 0, 0])
else:
row.extend([
len(lifetimes), # count
np.sum(lifetimes), # total persistence
np.max(lifetimes), # max persistence
np.mean(lifetimes), # mean persistence
np.std(lifetimes), # std persistence
np.sum(lifetimes ** 2), # persistence entropy proxy
])
features.append(row)
return np.array(features)
```
### Persistence Images
```python
def persistence_image(diagram: np.ndarray, resolution: int = 20,
sigma: float = 0.1,
weight_fn=None) -> np.ndarray:
"""
Compute a persistence image from a persistence diagram.
Transforms birth-death pairs into a stable, fixed-size representation.
"""
if weight_fn is None:
weight_fn = lambda birth, persistence: persistence
# Transform to birth-persistence coordinates
births = diagram[:, 0]
persistences = diagram[:, 1] - diagram[:, 0]
# Create grid
x_range = np.linspace(births.min() - sigma, births.max() + sigma, resolution)
y_range = np.linspace(0, persistences.max() + sigma, resolution)
xx, yy = np.meshgrid(x_range, y_range)
image = np.zeros((resolution, resolution))
for b, p in zip(births, persistences):
if not np.isfinite(p):
continue
w = weight_fn(b, p)
gaussian = w * np.exp(-((xx - b)**2 + (yy - p)**2) / (2 * sigma**2))
image += gaussian
return image
```
## The Mapper Algorithm
### Constructing Mapper Graphs
Mapper provides a compressed topological summary of high-dimensional data:
```python
import kmapper as km
from sklearn.cluster import DBSCAN
def run_mapper(data: np.ndarray, lens_fn=None, n_cubes: int = 10,
overlap: float = 0.3) -> dict:
"""
Run the Mapper algorithm to produce a simplicial complex
summarizing the shape of the data.
data: (n_samples, n_features) array
lens_fn: filter function (default: first two PCA components)
"""
mapper = km.KeplerMapper(verbose=0)
# Compute lens (filter function)
if lens_fn is None:
from sklearn.decomposition import PCA
lens = mapper.fit_transform(data, projection=PCA(n_components=2))
else:
lens = lens_fn(data)
# Build the Mapper graph
graph = mapper.map(
lens, data,
cover=km.Cover(n_cubes=n_cubes, perc_overlap=overlap),
clusterer=DBSCAN(eps=0.5, min_samples=3),
)
# Summary statistics
n_nodes = len(graph["nodes"])
n_edges = sum(len(v) for v in graph["links"].values()) // 2
return {
"n_nodes": n_nodes,
"n_edges": n_edges,
"node_sizes": [len(v) for v in graph["nodes"].values()],
"graph": graph,
}
```
### Mapper Parameters
| Parameter | Effect | Guidance |
|-----------|--------|---------|
| Filter function | Projects data to low dimensions | PCA, eccentricity, density |
| Number of intervals | Controls resolution of cover | 10-30 typical |
| Overlap percentage | Controls connectivity | 20-50%, higher = more edges |
| Clustering algorithm | Groups points within intervals | DBSCAN, single-linkage |
## Stability and Statistical Significance
### Bottleneck and Wasserstein Distances
```python
from persim import bottleneck, wasserstein
def compare_persistence_diagrams(dgm1: np.ndarray,
dgm2: np.ndarray) -> dict:
"""
Compare two persistence diagrams using standard TDA distances.
"""
bn_dist = bottleneck(dgm1, dgm2)
ws_dist = wasserstein(dgm1, dgm2, order=2)
return {
"bottleneck_distance": round(bn_dist, 6),
"wasserstein_2_distance": round(ws_dist, 6),
}
```
### Permutation Test for Topological Features
```python
def permutation_test_persistence(data1: np.ndarray, data2: np.ndarray,
n_permutations: int = 1000,
dim: int = 1) -> dict:
"""
Test whether two point clouds have significantly different
topological features using a permutation test on Wasserstein distance.
"""
from persim import wasserstein
# Observed distance
dgm1 = ripser(data1, maxdim=dim)["dgms"][dim]
dgm2 = ripser(data2, maxdim=dim)["dgms"][dim]
observed = wasserstein(dgm1, dgm2)
# Permutation distribution
combined = np.vstack([data1, data2])
n1 = len(data1)
perm_distances = []
for _ in range(n_permutations):
perm = np.random.permutation(len(combined))
perm_d1 = combined[perm[:n1]]
perm_d2 = combined[perm[n1:]]
perm_dgm1 = ripser(perm_d1, maxdim=dim)["dgms"][dim]
perm_dgm2 = ripser(perm_d2, maxdim=dim)["dgms"][dim]
perm_distances.append(wasserstein(perm_dgm1, perm_dgm2))
p_value = np.mean(np.array(perm_distances) >= observed)
return {
"observed_distance": round(observed, 6),
"p_value": round(p_value, 4),
"significant_at_005": p_value < 0.05,
}
```
## Tools and Libraries
- **Ripser / ripser.py**: Fast Vietoris-Rips persistence computation
- **GUDHI**: Comprehensive TDA library (C++ with Python bindings)
- **persim**: Persistence diagram distances and visualization
- **KeplerMapper**: Python Mapper algorithm implementation
- **giotto-tda**: TDA integrated with scikit-learn API
- **Dionysus 2**: Persistent homology and cohomology
- **scikit-tda**: Meta-package bundling ripser, persim, kepler-mapper, tadasetsRelated Skills
json-data-visualizer
Guide to JSON Crack for visualizing complex JSON data structures
datagen-research-guide
AI-driven multi-agent research assistant for end-to-end studies
data-collection-automation
Automate survey deployment, data collection, and pipeline management
database-comparison-guide
Compare major academic databases and when to use each for research
wikidata-api-guide
Query Wikidata SPARQL for scholarly metadata, authors, and entities
datacite-api
Resolve dataset DOIs and query research data metadata via DataCite
crossref-event-data-api
Track scholarly mentions across the web via Crossref Event Data
metadata-skills
24 metadata & bibliometrics skills. Trigger: DOI resolution, citation metrics, author disambiguation, bibliometrics. Design: metadata APIs and bibliometric analysis tools for scholarly records.
dataverse-api
Deposit and discover research datasets via Harvard Dataverse API
network-analysis-guide
Social network analysis methods, metrics, and visualization tools
ipums-microdata-api
Access harmonized census and survey microdata via the IPUMS API
astrophysics-data-guide
Astronomical data processing with Astropy, FITS files, and sky surveys