bio-microbiome-qiime2-workflow

QIIME2 command-line workflow for 16S/ITS amplicon analysis. Alternative to DADA2/phyloseq R workflow with built-in provenance tracking. Use when preferring CLI over R, needing reproducible provenance, or working within QIIME2 ecosystem.

1,802 stars

Best use case

bio-microbiome-qiime2-workflow is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

QIIME2 command-line workflow for 16S/ITS amplicon analysis. Alternative to DADA2/phyloseq R workflow with built-in provenance tracking. Use when preferring CLI over R, needing reproducible provenance, or working within QIIME2 ecosystem.

Teams using bio-microbiome-qiime2-workflow should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-microbiome-qiime2-workflow/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-microbiome-qiime2-workflow/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/bio-microbiome-qiime2-workflow/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How bio-microbiome-qiime2-workflow Compares

Feature / Agentbio-microbiome-qiime2-workflowStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

QIIME2 command-line workflow for 16S/ITS amplicon analysis. Alternative to DADA2/phyloseq R workflow with built-in provenance tracking. Use when preferring CLI over R, needing reproducible provenance, or working within QIIME2 ecosystem.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

## Version Compatibility

Reference examples tested with: DADA2 1.30+, MAFFT 7.520+, QIIME2 2024.2+, phyloseq 1.46+, scanpy 1.10+, scikit-learn 1.4+

Before using code patterns, verify installed versions match. If versions differ:
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# QIIME2 Amplicon Workflow

**"Run my amplicon analysis through QIIME2"** → Process 16S/ITS amplicon data end-to-end using the QIIME2 CLI with built-in provenance tracking, from import through denoising, taxonomy, and diversity analysis.
- CLI: `qiime dada2 denoise-paired`, `qiime diversity core-metrics-phylogenetic`

## Import Data

```bash
# Import paired-end FASTQ with manifest
qiime tools import \
    --type 'SampleData[PairedEndSequencesWithQuality]' \
    --input-path manifest.tsv \
    --output-path demux.qza \
    --input-format PairedEndFastqManifestPhred33V2

# View demultiplexed summary
qiime demux summarize \
    --i-data demux.qza \
    --o-visualization demux.qzv
```

## Denoise with DADA2

```bash
# DADA2 denoising (creates ASV table + representative sequences)
qiime dada2 denoise-paired \
    --i-demultiplexed-seqs demux.qza \
    --p-trunc-len-f 240 \
    --p-trunc-len-r 160 \
    --p-trim-left-f 0 \
    --p-trim-left-r 0 \
    --p-max-ee-f 2 \
    --p-max-ee-r 2 \
    --p-n-threads 0 \
    --o-table table.qza \
    --o-representative-sequences rep-seqs.qza \
    --o-denoising-stats denoising-stats.qza

# View denoising stats
qiime metadata tabulate \
    --m-input-file denoising-stats.qza \
    --o-visualization denoising-stats.qzv
```

## Alternative: Deblur Denoising

```bash
# Quality filter first
qiime quality-filter q-score \
    --i-demux demux.qza \
    --o-filtered-sequences demux-filtered.qza \
    --o-filter-stats filter-stats.qza

# Deblur denoise
qiime deblur denoise-16S \
    --i-demultiplexed-seqs demux-filtered.qza \
    --p-trim-length 250 \
    --p-sample-stats \
    --o-representative-sequences rep-seqs-deblur.qza \
    --o-table table-deblur.qza \
    --o-stats deblur-stats.qza
```

## Taxonomy Assignment

```bash
# Download pre-trained classifier (SILVA 138)
# wget https://data.qiime2.org/2024.5/common/silva-138-99-nb-classifier.qza

# Classify with sklearn naive Bayes
qiime feature-classifier classify-sklearn \
    --i-classifier silva-138-99-nb-classifier.qza \
    --i-reads rep-seqs.qza \
    --o-classification taxonomy.qza

# Visualize taxonomy
qiime metadata tabulate \
    --m-input-file taxonomy.qza \
    --o-visualization taxonomy.qzv

# Taxonomic barplot
qiime taxa barplot \
    --i-table table.qza \
    --i-taxonomy taxonomy.qza \
    --m-metadata-file metadata.tsv \
    --o-visualization taxa-barplot.qzv
```

## Phylogenetic Tree

```bash
# Build phylogeny with MAFFT + FastTree
qiime phylogeny align-to-tree-mafft-fasttree \
    --i-sequences rep-seqs.qza \
    --o-alignment aligned-rep-seqs.qza \
    --o-masked-alignment masked-aligned-rep-seqs.qza \
    --o-tree unrooted-tree.qza \
    --o-rooted-tree rooted-tree.qza
```

## Diversity Analysis

```bash
# Core metrics (alpha + beta diversity)
qiime diversity core-metrics-phylogenetic \
    --i-phylogeny rooted-tree.qza \
    --i-table table.qza \
    --p-sampling-depth 10000 \
    --m-metadata-file metadata.tsv \
    --output-dir core-metrics-results

# Alpha diversity significance
qiime diversity alpha-group-significance \
    --i-alpha-diversity core-metrics-results/shannon_vector.qza \
    --m-metadata-file metadata.tsv \
    --o-visualization shannon-significance.qzv

# Beta diversity significance (PERMANOVA)
qiime diversity beta-group-significance \
    --i-distance-matrix core-metrics-results/weighted_unifrac_distance_matrix.qza \
    --m-metadata-file metadata.tsv \
    --m-metadata-column Group \
    --p-method permanova \
    --o-visualization weighted-unifrac-permanova.qzv
```

## Differential Abundance (ANCOM)

**Goal:** Identify taxa with significantly different abundances between groups using QIIME2's ANCOM implementation.

**Approach:** Collapse the feature table to genus level, add pseudocounts for log-ratio computation, and run ANCOM to test for differential abundance per taxon.

```bash
# Collapse to genus level
qiime taxa collapse \
    --i-table table.qza \
    --i-taxonomy taxonomy.qza \
    --p-level 6 \
    --o-collapsed-table table-l6.qza

# Add pseudocount and run ANCOM
qiime composition add-pseudocount \
    --i-table table-l6.qza \
    --o-composition-table comp-table-l6.qza

qiime composition ancom \
    --i-table comp-table-l6.qza \
    --m-metadata-file metadata.tsv \
    --m-metadata-column Group \
    --o-visualization ancom-l6.qzv
```

## Export to R/Python

```bash
# Export feature table to BIOM
qiime tools export \
    --input-path table.qza \
    --output-path exported

# Convert BIOM to TSV
biom convert \
    -i exported/feature-table.biom \
    -o feature-table.tsv \
    --to-tsv

# Export taxonomy
qiime tools export \
    --input-path taxonomy.qza \
    --output-path exported

# Export tree
qiime tools export \
    --input-path rooted-tree.qza \
    --output-path exported
```

## Manifest File Format

```
# manifest.tsv (tab-separated)
sample-id	forward-absolute-filepath	reverse-absolute-filepath
sample1	/path/to/sample1_R1.fastq.gz	/path/to/sample1_R2.fastq.gz
sample2	/path/to/sample2_R1.fastq.gz	/path/to/sample2_R2.fastq.gz
```

## Metadata File Format

```
# metadata.tsv (tab-separated)
sample-id	Group	Timepoint
sample1	Treatment	Day0
sample2	Control	Day0
sample3	Treatment	Day7
```

## Related Skills

- amplicon-processing - DADA2 R workflow alternative
- taxonomy-assignment - More database options
- diversity-analysis - phyloseq R alternative
- differential-abundance - ALDEx2/ANCOM-BC in R

Related Skills

protein-design-workflow

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

End-to-end guidance for protein design pipelines. Use this skill when: (1) Starting a new protein design project, (2) Need step-by-step workflow guidance, (3) Understanding the full design pipeline, (4) Planning compute resources and timelines, (5) Integrating multiple design tools. For tool selection, use binder-design. For QC thresholds, use protein-qc.

bio-read-qc-fastp-workflow

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

All-in-one read preprocessing with fastp including adapter trimming, quality filtering, deduplication, base correction, and HTML report generation. Use when preprocessing Illumina data and wanting a single fast tool instead of separate Cutadapt, Trimmomatic, and FastQC steps.

bio-microbiome-taxonomy-assignment

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Taxonomic classification of ASVs using reference databases like SILVA, GTDB, or UNITE. Covers naive Bayes classifiers (DADA2, IDTAXA) and exact matching approaches. Use when assigning taxonomy to ASVs after DADA2 amplicon processing.

bio-microbiome-functional-prediction

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Predict metagenome functional content from 16S rRNA marker gene data using PICRUSt2. Infer KEGG, MetaCyc, and EC abundances from ASV tables. Use when functional profiling is needed from 16S data without shotgun metagenomics sequencing.

bio-microbiome-diversity-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Alpha and beta diversity analysis for microbiome data. Calculate within-sample richness, evenness, and between-sample dissimilarity with phyloseq and vegan. Use when comparing community composition across samples or testing for group differences in microbiome structure.

bio-microbiome-differential-abundance

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Differential abundance testing for microbiome data using compositionally-aware methods like ALDEx2, ANCOM-BC2, and MaAsLin2. Use when identifying taxa that differ between experimental groups while accounting for the compositional nature of microbiome data.

bio-microbiome-amplicon-processing

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Amplicon sequence variant (ASV) inference from 16S rRNA or ITS amplicon sequencing using DADA2. Covers quality filtering, error learning, denoising, and chimera removal. Use when processing demultiplexed amplicon FASTQ files to generate an ASV table for downstream analysis.

zinc-database

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

xlsx

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

writing-skills

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Use when creating new skills, editing existing skills, or verifying skills work before deployment

writing-plans

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Use when you have a spec or requirements for a multi-step task, before touching code