bio-microbiome-functional-prediction
Predict metagenome functional content from 16S rRNA marker gene data using PICRUSt2. Infer KEGG, MetaCyc, and EC abundances from ASV tables. Use when functional profiling is needed from 16S data without shotgun metagenomics sequencing.
Best use case
bio-microbiome-functional-prediction is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Predict metagenome functional content from 16S rRNA marker gene data using PICRUSt2. Infer KEGG, MetaCyc, and EC abundances from ASV tables. Use when functional profiling is needed from 16S data without shotgun metagenomics sequencing.
Teams using bio-microbiome-functional-prediction should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/bio-microbiome-functional-prediction/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How bio-microbiome-functional-prediction Compares
| Feature / Agent | bio-microbiome-functional-prediction | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Predict metagenome functional content from 16S rRNA marker gene data using PICRUSt2. Infer KEGG, MetaCyc, and EC abundances from ASV tables. Use when functional profiling is needed from 16S data without shotgun metagenomics sequencing.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
## Version Compatibility
Reference examples tested with: Biostrings 2.70+, ggplot2 3.5+, pandas 2.2+, phyloseq 1.46+, scanpy 1.10+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Functional Prediction with PICRUSt2
**"Predict functional pathways from my 16S data"** → Infer metagenome functional content from marker gene (16S/ITS) ASV tables using phylogenetic placement and gene content prediction.
- CLI: `picrust2_pipeline.py -s seqs.fna -i table.biom -o output/`
## Prepare Input Files
```r
library(phyloseq)
library(Biostrings)
ps <- readRDS('phyloseq_object.rds')
# Export ASV table (samples as columns)
otu <- as.data.frame(otu_table(ps))
if (!taxa_are_rows(ps)) otu <- t(otu)
write.table(otu, 'asv_table.tsv', sep = '\t', quote = FALSE)
# Export ASV sequences as FASTA
seqs <- refseq(ps) # Or extract from ASV names if stored there
writeXStringSet(seqs, 'asv_seqs.fasta')
```
## Run PICRUSt2 Pipeline
```bash
# Full pipeline (place sequences, predict functions, metagenome inference)
picrust2_pipeline.py \
-s asv_seqs.fasta \
-i asv_table.tsv \
-o picrust2_output \
-p 4 \
--stratified \
--per_sequence_contrib
# Output files:
# - pathway_abundance.tsv (MetaCyc pathways)
# - KO_metagenome_out/pred_metagenome_unstrat.tsv (KEGG orthologs)
# - EC_metagenome_out/pred_metagenome_unstrat.tsv (EC numbers)
```
## Step-by-Step Pipeline
**Goal:** Predict functional metagenome content from 16S ASVs using the full PICRUSt2 pipeline with explicit control over each step.
**Approach:** Place ASV sequences into a reference tree, predict gene content via hidden-state prediction, infer per-sample metagenome abundances, and reconstruct MetaCyc pathways.
```bash
# 1. Place sequences in reference tree
place_seqs.py -s asv_seqs.fasta -o placed_seqs.tre -p 4
# 2. Hidden state prediction (gene content)
hsp.py -i 16S -t placed_seqs.tre -o marker_nsti_predicted.tsv -m pic -n
# 3. Predict gene families (KO)
hsp.py -i KO -t placed_seqs.tre -o KO_predicted.tsv -m pic
# 4. Metagenome inference
metagenome_pipeline.py \
-i asv_table.tsv \
-m marker_nsti_predicted.tsv \
-f KO_predicted.tsv \
-o KO_metagenome_out \
--strat_out
# 5. Pathway inference
pathway_pipeline.py \
-i KO_metagenome_out/pred_metagenome_contrib.tsv \
-o pathway_output \
-p 4
```
## Quality Control: NSTI
```python
import pandas as pd
# NSTI = Nearest Sequenced Taxon Index
# Lower = more reliable prediction (< 2 is acceptable)
nsti = pd.read_csv('marker_nsti_predicted.tsv', sep='\t')
print(f'Mean NSTI: {nsti["metadata_NSTI"].mean():.3f}')
print(f'ASVs with NSTI > 2: {(nsti["metadata_NSTI"] > 2).sum()}')
```
## Analyze Pathway Output
```r
library(ggplot2)
pathways <- read.delim('picrust2_output/pathways_out/path_abun_unstrat.tsv', row.names = 1)
metadata <- read.csv('sample_metadata.csv', row.names = 1)
# Normalize to relative abundance
pathways_rel <- sweep(pathways, 2, colSums(pathways), '/')
# Differential pathway analysis (use ALDEx2 or similar)
library(ALDEx2)
groups <- metadata[colnames(pathways), 'Group']
pathway_aldex <- aldex(as.data.frame(t(pathways)), groups, mc.samples = 128)
```
## Add Pathway Descriptions
```bash
# Map pathway IDs to names
add_descriptions.py \
-i pathway_abundance.tsv \
-m METACYC \
-o pathway_abundance_described.tsv
```
## KEGG Module Analysis
```r
# Analyze KEGG modules instead of individual KOs
ko_table <- read.delim('KO_metagenome_out/pred_metagenome_unstrat.tsv', row.names = 1)
# Use KEGGREST for module mapping
library(KEGGREST)
modules <- keggLink('module', 'ko')
```
## Limitations
- Predictions based on phylogenetic placement
- Novel taxa (high NSTI) have unreliable predictions
- 16S resolution limits species-level accuracy
- Cannot detect horizontal gene transfer events
## Related Skills
- amplicon-processing - Generate ASV input
- metagenomics/functional-profiling - Direct shotgun-based profiling
- pathway-analysis/kegg-pathways - KEGG pathway enrichmentRelated Skills
tooluniverse-immunotherapy-response-prediction
Predict patient response to immune checkpoint inhibitors (ICIs) using multi-biomarker integration. Given a cancer type, somatic mutations, and optional biomarkers (TMB, PD-L1, MSI status), performs systematic analysis across 11 phases covering TMB classification, neoantigen burden estimation, MSI/MMR assessment, PD-L1 evaluation, immune microenvironment profiling, mutation-based resistance/sensitivity prediction, clinical evidence retrieval, and multi-biomarker score integration. Generates a quantitative ICI Response Score (0-100), response likelihood tier, specific ICI drug recommendations with evidence, resistance risk factors, and a monitoring plan. Use when oncologists ask about immunotherapy eligibility, checkpoint inhibitor selection, or biomarker-guided ICI treatment decisions.
bio-structural-biology-modern-structure-prediction
Predict protein structures using modern ML models including AlphaFold3, ESMFold, Chai-1, and Boltz-1. Use when predicting structures for novel proteins, protein complexes, or when comparing predictions across multiple methods.
bio-structural-biology-alphafold-predictions
Access and analyze AlphaFold protein structure predictions. Use when predicted structures are needed for proteins without experimental structures, or for confidence scores (pLDDT).
bio-microbiome-taxonomy-assignment
Taxonomic classification of ASVs using reference databases like SILVA, GTDB, or UNITE. Covers naive Bayes classifiers (DADA2, IDTAXA) and exact matching approaches. Use when assigning taxonomy to ASVs after DADA2 amplicon processing.
bio-microbiome-qiime2-workflow
QIIME2 command-line workflow for 16S/ITS amplicon analysis. Alternative to DADA2/phyloseq R workflow with built-in provenance tracking. Use when preferring CLI over R, needing reproducible provenance, or working within QIIME2 ecosystem.
bio-microbiome-diversity-analysis
Alpha and beta diversity analysis for microbiome data. Calculate within-sample richness, evenness, and between-sample dissimilarity with phyloseq and vegan. Use when comparing community composition across samples or testing for group differences in microbiome structure.
bio-microbiome-differential-abundance
Differential abundance testing for microbiome data using compositionally-aware methods like ALDEx2, ANCOM-BC2, and MaAsLin2. Use when identifying taxa that differ between experimental groups while accounting for the compositional nature of microbiome data.
bio-microbiome-amplicon-processing
Amplicon sequence variant (ASV) inference from 16S rRNA or ITS amplicon sequencing using DADA2. Covers quality filtering, error learning, denoising, and chimera removal. Use when processing demultiplexed amplicon FASTQ files to generate an ASV table for downstream analysis.
bio-metagenomics-functional-profiling
Profile functional potential of metagenomes using HUMAnN3 and similar tools. Use when obtaining pathway abundances, gene family counts, or functional annotations from metagenomic data.
bio-immunoinformatics-neoantigen-prediction
Identify tumor neoantigens from somatic mutations using pVACtools for personalized cancer immunotherapy. Predict mutant peptides that bind patient HLA and may elicit T-cell responses. Use when identifying vaccine targets or checkpoint inhibitor response biomarkers from tumor sequencing data.
bio-immunoinformatics-mhc-binding-prediction
Predict peptide-MHC class I and II binding affinity using MHCflurry and NetMHCpan neural network models. Identify potential T-cell epitopes from protein sequences. Use when predicting MHC binding for vaccine design or neoantigen identification.
bio-immunoinformatics-epitope-prediction
Predict B-cell and T-cell epitopes using BepiPred, IEDB tools, and structure-based methods for vaccine and antibody design. Identify immunogenic regions in antigens. Use when designing vaccines, mapping antibody binding sites, or predicting immunogenic peptides.