bio-microbiome-amplicon-processing
Amplicon sequence variant (ASV) inference from 16S rRNA or ITS amplicon sequencing using DADA2. Covers quality filtering, error learning, denoising, and chimera removal. Use when processing demultiplexed amplicon FASTQ files to generate an ASV table for downstream analysis.
Best use case
bio-microbiome-amplicon-processing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Amplicon sequence variant (ASV) inference from 16S rRNA or ITS amplicon sequencing using DADA2. Covers quality filtering, error learning, denoising, and chimera removal. Use when processing demultiplexed amplicon FASTQ files to generate an ASV table for downstream analysis.
Teams using bio-microbiome-amplicon-processing should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/bio-microbiome-amplicon-processing/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How bio-microbiome-amplicon-processing Compares
| Feature / Agent | bio-microbiome-amplicon-processing | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Amplicon sequence variant (ASV) inference from 16S rRNA or ITS amplicon sequencing using DADA2. Covers quality filtering, error learning, denoising, and chimera removal. Use when processing demultiplexed amplicon FASTQ files to generate an ASV table for downstream analysis.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Top AI Agents for Productivity
See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.
SKILL.md Source
## Version Compatibility
Reference examples tested with: DADA2 1.30+, cutadapt 4.4+
Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Amplicon Processing with DADA2
**"Process my 16S amplicon data to get ASVs"** → Denoise amplicon sequencing reads into exact amplicon sequence variants (ASVs) through quality filtering, error model learning, and chimera removal.
- R: `dada2::filterAndTrim()` → `learnErrors()` → `dada()` → `removeBimeraDenovo()`
## Complete DADA2 Workflow
```r
library(dada2)
path <- 'raw_reads'
fnFs <- sort(list.files(path, pattern = '_R1_001.fastq.gz', full.names = TRUE))
fnRs <- sort(list.files(path, pattern = '_R2_001.fastq.gz', full.names = TRUE))
sample_names <- sapply(strsplit(basename(fnFs), '_'), `[`, 1)
# Quality profiles
plotQualityProfile(fnFs[1:2])
plotQualityProfile(fnRs[1:2])
```
## Quality Filtering and Trimming
```r
filtFs <- file.path('filtered', paste0(sample_names, '_F_filt.fastq.gz'))
filtRs <- file.path('filtered', paste0(sample_names, '_R_filt.fastq.gz'))
names(filtFs) <- sample_names
names(filtRs) <- sample_names
# Filter parameters depend on amplicon region and read length
out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs,
truncLen = c(240, 160), # Trim to quality scores
maxN = 0, # No ambiguous bases
maxEE = c(2, 2), # Max expected errors
truncQ = 2, # Truncate at first Q <= 2
rm.phix = TRUE, # Remove PhiX
compress = TRUE,
multithread = TRUE)
```
## Error Rate Learning
```r
errF <- learnErrors(filtFs, multithread = TRUE)
errR <- learnErrors(filtRs, multithread = TRUE)
# Visualize error rates
plotErrors(errF, nominalQ = TRUE)
```
## Sample Inference (Denoising)
```r
dadaFs <- dada(filtFs, err = errF, multithread = TRUE)
dadaRs <- dada(filtRs, err = errR, multithread = TRUE)
# Check results
dadaFs[[1]]
```
## Merge Paired Reads
```r
mergers <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, verbose = TRUE)
# Check merge success
head(mergers[[1]])
```
## Construct Sequence Table
```r
seqtab <- makeSequenceTable(mergers)
dim(seqtab)
# Check length distribution
table(nchar(getSequences(seqtab)))
```
## Remove Chimeras
```r
seqtab_nochim <- removeBimeraDenovo(seqtab, method = 'consensus',
multithread = TRUE, verbose = TRUE)
# Percentage retained
sum(seqtab_nochim) / sum(seqtab)
```
## Track Reads Through Pipeline
**Goal:** Generate a per-sample summary table showing how many reads survived each DADA2 processing step for quality assessment.
**Approach:** Extract read counts from each pipeline stage (filtering, denoising, merging, chimera removal) and combine into a single tracking matrix.
```r
getN <- function(x) sum(getUniques(x))
track <- cbind(out, sapply(dadaFs, getN), sapply(dadaRs, getN),
sapply(mergers, getN), rowSums(seqtab_nochim))
colnames(track) <- c('input', 'filtered', 'denoisedF', 'denoisedR', 'merged', 'nonchim')
rownames(track) <- sample_names
track
```
## ITS-Specific Processing
```r
# For ITS, use cutadapt to remove primers first (variable length amplicons)
# Then skip truncLen (don't truncate ITS to fixed length)
out_its <- filterAndTrim(fnFs, filtFs, fnRs, filtRs,
maxN = 0, maxEE = c(2, 2), truncQ = 2,
minLen = 50, # Minimum length
rm.phix = TRUE, compress = TRUE, multithread = TRUE)
```
## Related Skills
- taxonomy-assignment - Assign taxonomy to ASVs
- read-qc/quality-reports - Pre-DADA2 quality assessment
- diversity-analysis - Analyze ASV tableRelated Skills
tcga-bulk-data-preprocessing-with-omicverse
Guide Claude through ingesting TCGA sample sheets, expression archives, and clinical carts into omicverse, initialising survival metadata, and exporting annotated AnnData files.
single-cell-preprocessing-with-omicverse
Walk through omicverse's single-cell preprocessing tutorials to QC PBMC3k data, normalise counts, detect HVGs, and run PCA/embedding pipelines on CPU, CPU–GPU mixed, or GPU stacks.
post-processing
Extract, analyze, and visualize simulation output data. Use for field extraction, time series analysis, line profiles, statistical summaries, derived quantity computation, result comparison to references, and automated report generation from simulation results.
pdf-processing
Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
pdf-processing-pro
Production-ready PDF processing with forms, tables, OCR, validation, and batch operations. Use when working with complex PDF workflows in production environments, processing large volumes of PDFs, or requiring robust error handling and validation.
bio-spatial-transcriptomics-spatial-preprocessing
Quality control, filtering, normalization, and feature selection for spatial transcriptomics data. Calculate QC metrics, filter spots/cells, normalize counts, and identify highly variable genes. Use when filtering and normalizing spatial transcriptomics data.
bio-single-cell-preprocessing
Quality control, filtering, and normalization for single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for calculating QC metrics, filtering cells and genes, normalizing counts, identifying highly variable genes, and scaling data. Use when filtering, normalizing, and selecting features in single-cell data.
bio-ribo-seq-riboseq-preprocessing
Preprocess ribosome profiling data including adapter trimming, size selection, rRNA removal, and alignment. Use when preparing Ribo-seq reads for downstream analysis of translation.
bio-read-qc-umi-processing
Extract, process, and deduplicate reads using Unique Molecular Identifiers (UMIs) with umi_tools. Use when library prep includes UMIs and accurate molecule counting is needed, such as in single-cell RNA-seq, low-input RNA-seq, or targeted sequencing to distinguish PCR from biological duplicates.
bio-microbiome-taxonomy-assignment
Taxonomic classification of ASVs using reference databases like SILVA, GTDB, or UNITE. Covers naive Bayes classifiers (DADA2, IDTAXA) and exact matching approaches. Use when assigning taxonomy to ASVs after DADA2 amplicon processing.
bio-microbiome-qiime2-workflow
QIIME2 command-line workflow for 16S/ITS amplicon analysis. Alternative to DADA2/phyloseq R workflow with built-in provenance tracking. Use when preferring CLI over R, needing reproducible provenance, or working within QIIME2 ecosystem.
bio-microbiome-functional-prediction
Predict metagenome functional content from 16S rRNA marker gene data using PICRUSt2. Infer KEGG, MetaCyc, and EC abundances from ASV tables. Use when functional profiling is needed from 16S data without shotgun metagenomics sequencing.