bio-metagenomics-visualization
Visualize metagenomic profiles using R (phyloseq, microbiome) and Python (matplotlib, seaborn). Create stacked bar plots, heatmaps, PCA plots, and diversity analyses. Use when creating publication-quality figures from MetaPhlAn, Bracken, or other taxonomic profiling output.
Best use case
bio-metagenomics-visualization is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Visualize metagenomic profiles using R (phyloseq, microbiome) and Python (matplotlib, seaborn). Create stacked bar plots, heatmaps, PCA plots, and diversity analyses. Use when creating publication-quality figures from MetaPhlAn, Bracken, or other taxonomic profiling output.
Teams using bio-metagenomics-visualization should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/bio-metagenomics-visualization/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How bio-metagenomics-visualization Compares
| Feature / Agent | bio-metagenomics-visualization | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Visualize metagenomic profiles using R (phyloseq, microbiome) and Python (matplotlib, seaborn). Create stacked bar plots, heatmaps, PCA plots, and diversity analyses. Use when creating publication-quality figures from MetaPhlAn, Bracken, or other taxonomic profiling output.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
## Version Compatibility
Reference examples tested with: MetaPhlAn 4.1+, ggplot2 3.5+, matplotlib 3.8+, pandas 2.2+, phyloseq 1.46+, scanpy 1.10+, scikit-learn 1.4+, scipy 1.12+, seaborn 0.13+, vegan 2.6+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Metagenome Visualization
**"Visualize the taxonomic composition of my metagenomes"** → Create publication-quality figures (stacked bars, heatmaps, ordination plots) from taxonomic profiling output to compare community composition across samples.
- R: `phyloseq::plot_bar()`, `microbiome` package
- Python: `matplotlib`/`seaborn` with pandas for custom compositions
## Python - Stacked Bar Plot
```python
import pandas as pd
import matplotlib.pyplot as plt
abundance = pd.read_csv('merged_abundance.txt', sep='\t', index_col=0)
abundance = abundance[abundance.index.str.contains('s__')]
abundance.index = abundance.index.str.split('|').str[-1].str.replace('s__', '')
top_n = 10
top_species = abundance.sum(axis=1).nlargest(top_n).index
abundance_top = abundance.loc[top_species]
abundance_top.loc['Other'] = abundance.drop(top_species).sum()
abundance_top.T.plot(kind='bar', stacked=True, figsize=(12, 6), colormap='tab20')
plt.xlabel('Sample')
plt.ylabel('Relative Abundance (%)')
plt.title('Species Composition')
plt.legend(bbox_to_anchor=(1.02, 1), loc='upper left')
plt.tight_layout()
plt.savefig('stacked_bar.png', dpi=300)
```
## Python - Heatmap
```python
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
abundance = pd.read_csv('merged_abundance.txt', sep='\t', index_col=0)
abundance = abundance[abundance.index.str.contains('s__')]
abundance.index = abundance.index.str.split('|').str[-1].str.replace('s__', '')
top_species = abundance.sum(axis=1).nlargest(20).index
abundance_top = abundance.loc[top_species]
plt.figure(figsize=(12, 10))
sns.heatmap(abundance_top, cmap='YlOrRd', annot=False, cbar_kws={'label': 'Abundance (%)'})
plt.xlabel('Sample')
plt.ylabel('Species')
plt.title('Species Abundance Heatmap')
plt.tight_layout()
plt.savefig('heatmap.png', dpi=300)
```
## Python - PCA
```python
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
abundance = pd.read_csv('merged_abundance.txt', sep='\t', index_col=0).T
scaler = StandardScaler()
abundance_scaled = scaler.fit_transform(abundance)
pca = PCA(n_components=2)
pca_result = pca.fit_transform(abundance_scaled)
plt.figure(figsize=(8, 6))
plt.scatter(pca_result[:, 0], pca_result[:, 1])
for i, sample in enumerate(abundance.index):
plt.annotate(sample, (pca_result[i, 0], pca_result[i, 1]))
plt.xlabel(f'PC1 ({pca.explained_variance_ratio_[0]*100:.1f}%)')
plt.ylabel(f'PC2 ({pca.explained_variance_ratio_[1]*100:.1f}%)')
plt.title('PCA of Sample Composition')
plt.savefig('pca.png', dpi=300)
```
## R - phyloseq Setup
**Goal:** Convert a MetaPhlAn merged abundance table into a phyloseq object for ecological analysis and visualization in R.
**Approach:** Filter to species-level rows, clean taxonomy names, build an OTU table and sample metadata data frame, and assemble into a phyloseq object.
```r
library(phyloseq)
library(ggplot2)
library(vegan)
# From MetaPhlAn merged table
abundance <- read.table('merged_abundance.txt', sep = '\t', header = TRUE, row.names = 1)
# Filter to species level
species <- abundance[grepl('s__', rownames(abundance)), ]
rownames(species) <- sapply(strsplit(rownames(species), '\\|'), tail, 1)
rownames(species) <- gsub('s__', '', rownames(species))
# Create phyloseq object
otu <- otu_table(as.matrix(species), taxa_are_rows = TRUE)
# Sample metadata (create or load)
sample_data <- data.frame(
Sample = colnames(species),
Group = c('Control', 'Control', 'Treatment', 'Treatment'),
row.names = colnames(species)
)
samp <- sample_data(sample_data)
ps <- phyloseq(otu, samp)
```
## R - Stacked Bar Plot
```r
library(phyloseq)
library(ggplot2)
# Top taxa
top_taxa <- names(sort(taxa_sums(ps), decreasing = TRUE))[1:10]
ps_top <- prune_taxa(top_taxa, ps)
# Stacked bar
plot_bar(ps_top, fill = 'Species') +
geom_bar(stat = 'identity', position = 'stack') +
theme_minimal() +
labs(x = 'Sample', y = 'Relative Abundance (%)') +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
```
## R - Ordination (PCoA)
```r
library(phyloseq)
library(ggplot2)
# Bray-Curtis distance
ord <- ordinate(ps, method = 'PCoA', distance = 'bray')
# Plot ordination
plot_ordination(ps, ord, color = 'Group') +
geom_point(size = 4) +
stat_ellipse() +
theme_minimal() +
labs(title = 'PCoA of Sample Composition')
```
## R - Alpha Diversity
```r
library(phyloseq)
library(ggplot2)
# Calculate diversity metrics
alpha_div <- estimate_richness(ps, measures = c('Shannon', 'Simpson', 'Observed'))
# Add metadata
alpha_div$Group <- sample_data(ps)$Group
# Plot
ggplot(alpha_div, aes(x = Group, y = Shannon, fill = Group)) +
geom_boxplot() +
geom_jitter(width = 0.1) +
theme_minimal() +
labs(title = 'Alpha Diversity by Group', y = 'Shannon Index')
```
## R - Beta Diversity (PERMANOVA)
```r
library(vegan)
# Get abundance matrix
abundance_matrix <- as(otu_table(ps), 'matrix')
if (taxa_are_rows(ps)) abundance_matrix <- t(abundance_matrix)
# Calculate Bray-Curtis distance
dist_bc <- vegdist(abundance_matrix, method = 'bray')
# PERMANOVA
groups <- sample_data(ps)$Group
permanova <- adonis2(dist_bc ~ groups, permutations = 999)
permanova
```
## Krona Chart
```bash
# From Kraken2 report
ktImportTaxonomy -q 1 -t 5 kraken_report.txt -o krona_chart.html
# From MetaPhlAn
metaphlan2krona.py -p profile.txt -k krona_profile.txt
ktImportText krona_profile.txt -o krona_metaphlan.html
```
## Key Packages
### Python
| Package | Purpose |
|---------|---------|
| matplotlib | General plotting |
| seaborn | Statistical visualizations |
| scikit-learn | PCA, clustering |
| scipy | Statistical tests |
### R
| Package | Purpose |
|---------|---------|
| phyloseq | Microbiome data handling |
| vegan | Community ecology |
| ggplot2 | Visualization |
| microbiome | Additional analyses |
## Related Skills
- kraken-classification - Generate input data
- metaphlan-profiling - Generate input data
- abundance-estimation - Process Kraken outputRelated Skills
scientific-visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
claw-metagenomics
Shotgun metagenomics profiling — taxonomy, resistome, and functional pathways
bio-tcr-bcr-analysis-repertoire-visualization
Create publication-quality visualizations of immune repertoire data including circos plots, clone tracking, diversity plots, and network graphs. Use when generating figures for repertoire comparisons, clonal dynamics, or V(D)J gene usage.
bio-spatial-transcriptomics-spatial-visualization
Visualize spatial transcriptomics data using Squidpy and Scanpy. Create tissue plots with gene expression, clusters, and annotations overlaid on histology images. Use when visualizing spatial expression patterns.
bio-pathway-enrichment-visualization
Visualize enrichment results using enrichplot package functions. Use when creating publication-quality figures from clusterProfiler results. Covers dotplot, barplot, cnetplot, emapplot, gseaplot2, ridgeplot, and treeplot.
bio-metagenomics-strain-tracking
Track bacterial strains using MASH, sourmash, fastANI, and inStrain. Compare genomes, detect contamination, and monitor strain-level variation. Use when needing sub-species resolution for outbreak tracking, transmission analysis, or within-host strain dynamics.
bio-metagenomics-metaphlan
Marker gene-based taxonomic profiling using MetaPhlAn 4. Provides accurate species-level relative abundances using clade-specific markers. Use when accurate taxonomic profiling is needed and computational resources are limited, or for comparison with HMP/other MetaPhlAn studies.
bio-metagenomics-kraken
Taxonomic classification of metagenomic reads using Kraken2. Fast k-mer based classification against RefSeq database. Use when performing initial taxonomic classification of shotgun metagenomic reads before abundance estimation with Bracken.
bio-metagenomics-functional-profiling
Profile functional potential of metagenomes using HUMAnN3 and similar tools. Use when obtaining pathway abundances, gene family counts, or functional annotations from metagenomic data.
bio-metagenomics-amr-detection
Detect antimicrobial resistance genes using AMRFinderPlus, ResFinder, and CARD. Screen isolates and metagenomes for resistance determinants. Use when characterizing resistance profiles in clinical isolates, surveillance samples, or metagenomic data.
bio-metagenomics-abundance
Species abundance estimation using Bracken with Kraken2 output. Redistributes reads from higher taxonomic levels to species for more accurate estimates. Use when accurate species-level abundances are needed from Kraken2 classification output.
bio-hi-c-analysis-hic-visualization
Visualize Hi-C contact matrices, TADs, loops, and genomic features using matplotlib, cooltools, and HiCExplorer. Create triangle plots, virtual 4C, and multi-track figures. Use when visualizing contact matrices or genomic features.