bio-microbiome-functional-prediction

Predict metagenome functional content from 16S rRNA marker gene data using PICRUSt2. Infer KEGG, MetaCyc, and EC abundances from ASV tables. Use when functional profiling is needed from 16S data without shotgun metagenomics sequencing.

1,802 stars

Best use case

bio-microbiome-functional-prediction is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Predict metagenome functional content from 16S rRNA marker gene data using PICRUSt2. Infer KEGG, MetaCyc, and EC abundances from ASV tables. Use when functional profiling is needed from 16S data without shotgun metagenomics sequencing.

Teams using bio-microbiome-functional-prediction should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-microbiome-functional-prediction/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-microbiome-functional-prediction/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/bio-microbiome-functional-prediction/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How bio-microbiome-functional-prediction Compares

Feature / Agentbio-microbiome-functional-predictionStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Predict metagenome functional content from 16S rRNA marker gene data using PICRUSt2. Infer KEGG, MetaCyc, and EC abundances from ASV tables. Use when functional profiling is needed from 16S data without shotgun metagenomics sequencing.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: Biostrings 2.70+, ggplot2 3.5+, pandas 2.2+, phyloseq 1.46+, scanpy 1.10+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Functional Prediction with PICRUSt2

**"Predict functional pathways from my 16S data"** → Infer metagenome functional content from marker gene (16S/ITS) ASV tables using phylogenetic placement and gene content prediction.
- CLI: `picrust2_pipeline.py -s seqs.fna -i table.biom -o output/`

## Prepare Input Files

```r
library(phyloseq)
library(Biostrings)

ps <- readRDS('phyloseq_object.rds')

# Export ASV table (samples as columns)
otu <- as.data.frame(otu_table(ps))
if (!taxa_are_rows(ps)) otu <- t(otu)
write.table(otu, 'asv_table.tsv', sep = '\t', quote = FALSE)

# Export ASV sequences as FASTA
seqs <- refseq(ps)  # Or extract from ASV names if stored there
writeXStringSet(seqs, 'asv_seqs.fasta')
```

## Run PICRUSt2 Pipeline

```bash
# Full pipeline (place sequences, predict functions, metagenome inference)
picrust2_pipeline.py \
    -s asv_seqs.fasta \
    -i asv_table.tsv \
    -o picrust2_output \
    -p 4 \
    --stratified \
    --per_sequence_contrib

# Output files:
# - pathway_abundance.tsv (MetaCyc pathways)
# - KO_metagenome_out/pred_metagenome_unstrat.tsv (KEGG orthologs)
# - EC_metagenome_out/pred_metagenome_unstrat.tsv (EC numbers)
```

## Step-by-Step Pipeline

**Goal:** Predict functional metagenome content from 16S ASVs using the full PICRUSt2 pipeline with explicit control over each step.

**Approach:** Place ASV sequences into a reference tree, predict gene content via hidden-state prediction, infer per-sample metagenome abundances, and reconstruct MetaCyc pathways.

```bash
# 1. Place sequences in reference tree
place_seqs.py -s asv_seqs.fasta -o placed_seqs.tre -p 4

# 2. Hidden state prediction (gene content)
hsp.py -i 16S -t placed_seqs.tre -o marker_nsti_predicted.tsv -m pic -n

# 3. Predict gene families (KO)
hsp.py -i KO -t placed_seqs.tre -o KO_predicted.tsv -m pic

# 4. Metagenome inference
metagenome_pipeline.py \
    -i asv_table.tsv \
    -m marker_nsti_predicted.tsv \
    -f KO_predicted.tsv \
    -o KO_metagenome_out \
    --strat_out

# 5. Pathway inference
pathway_pipeline.py \
    -i KO_metagenome_out/pred_metagenome_contrib.tsv \
    -o pathway_output \
    -p 4
```

## Quality Control: NSTI

```python
import pandas as pd

# NSTI = Nearest Sequenced Taxon Index
# Lower = more reliable prediction (< 2 is acceptable)
nsti = pd.read_csv('marker_nsti_predicted.tsv', sep='\t')
print(f'Mean NSTI: {nsti["metadata_NSTI"].mean():.3f}')
print(f'ASVs with NSTI > 2: {(nsti["metadata_NSTI"] > 2).sum()}')
```

## Analyze Pathway Output

```r
library(ggplot2)

pathways <- read.delim('picrust2_output/pathways_out/path_abun_unstrat.tsv', row.names = 1)
metadata <- read.csv('sample_metadata.csv', row.names = 1)

# Normalize to relative abundance
pathways_rel <- sweep(pathways, 2, colSums(pathways), '/')

# Differential pathway analysis (use ALDEx2 or similar)
library(ALDEx2)
groups <- metadata[colnames(pathways), 'Group']
pathway_aldex <- aldex(as.data.frame(t(pathways)), groups, mc.samples = 128)
```

## Add Pathway Descriptions

```bash
# Map pathway IDs to names
add_descriptions.py \
    -i pathway_abundance.tsv \
    -m METACYC \
    -o pathway_abundance_described.tsv
```

## KEGG Module Analysis

```r
# Analyze KEGG modules instead of individual KOs
ko_table <- read.delim('KO_metagenome_out/pred_metagenome_unstrat.tsv', row.names = 1)

# Use KEGGREST for module mapping
library(KEGGREST)
modules <- keggLink('module', 'ko')
```

## Limitations

- Predictions based on phylogenetic placement
- Novel taxa (high NSTI) have unreliable predictions
- 16S resolution limits species-level accuracy
- Cannot detect horizontal gene transfer events

## Related Skills

- amplicon-processing - Generate ASV input
- metagenomics/functional-profiling - Direct shotgun-based profiling
- pathway-analysis/kegg-pathways - KEGG pathway enrichment

Related Skills

tooluniverse-immunotherapy-response-prediction

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Predict patient response to immune checkpoint inhibitors (ICIs) using multi-biomarker integration. Given a cancer type, somatic mutations, and optional biomarkers (TMB, PD-L1, MSI status), performs systematic analysis across 11 phases covering TMB classification, neoantigen burden estimation, MSI/MMR assessment, PD-L1 evaluation, immune microenvironment profiling, mutation-based resistance/sensitivity prediction, clinical evidence retrieval, and multi-biomarker score integration. Generates a quantitative ICI Response Score (0-100), response likelihood tier, specific ICI drug recommendations with evidence, resistance risk factors, and a monitoring plan. Use when oncologists ask about immunotherapy eligibility, checkpoint inhibitor selection, or biomarker-guided ICI treatment decisions.

bio-structural-biology-modern-structure-prediction

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Predict protein structures using modern ML models including AlphaFold3, ESMFold, Chai-1, and Boltz-1. Use when predicting structures for novel proteins, protein complexes, or when comparing predictions across multiple methods.

bio-structural-biology-alphafold-predictions

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Access and analyze AlphaFold protein structure predictions. Use when predicted structures are needed for proteins without experimental structures, or for confidence scores (pLDDT).

bio-microbiome-taxonomy-assignment

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Taxonomic classification of ASVs using reference databases like SILVA, GTDB, or UNITE. Covers naive Bayes classifiers (DADA2, IDTAXA) and exact matching approaches. Use when assigning taxonomy to ASVs after DADA2 amplicon processing.

bio-microbiome-qiime2-workflow

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

QIIME2 command-line workflow for 16S/ITS amplicon analysis. Alternative to DADA2/phyloseq R workflow with built-in provenance tracking. Use when preferring CLI over R, needing reproducible provenance, or working within QIIME2 ecosystem.

bio-microbiome-diversity-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Alpha and beta diversity analysis for microbiome data. Calculate within-sample richness, evenness, and between-sample dissimilarity with phyloseq and vegan. Use when comparing community composition across samples or testing for group differences in microbiome structure.

bio-microbiome-differential-abundance

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Differential abundance testing for microbiome data using compositionally-aware methods like ALDEx2, ANCOM-BC2, and MaAsLin2. Use when identifying taxa that differ between experimental groups while accounting for the compositional nature of microbiome data.

bio-microbiome-amplicon-processing

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Amplicon sequence variant (ASV) inference from 16S rRNA or ITS amplicon sequencing using DADA2. Covers quality filtering, error learning, denoising, and chimera removal. Use when processing demultiplexed amplicon FASTQ files to generate an ASV table for downstream analysis.

bio-metagenomics-functional-profiling

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Profile functional potential of metagenomes using HUMAnN3 and similar tools. Use when obtaining pathway abundances, gene family counts, or functional annotations from metagenomic data.

bio-immunoinformatics-neoantigen-prediction

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Identify tumor neoantigens from somatic mutations using pVACtools for personalized cancer immunotherapy. Predict mutant peptides that bind patient HLA and may elicit T-cell responses. Use when identifying vaccine targets or checkpoint inhibitor response biomarkers from tumor sequencing data.

bio-immunoinformatics-mhc-binding-prediction

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Predict peptide-MHC class I and II binding affinity using MHCflurry and NetMHCpan neural network models. Identify potential T-cell epitopes from protein sequences. Use when predicting MHC binding for vaccine design or neoantigen identification.

bio-immunoinformatics-epitope-prediction

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Predict B-cell and T-cell epitopes using BepiPred, IEDB tools, and structure-based methods for vaccine and antibody design. Identify immunogenic regions in antigens. Use when designing vaccines, mapping antibody binding sites, or predicting immunogenic peptides.