bio-metabolomics-xcms-preprocessing
XCMS3 workflow for LC-MS/MS metabolomics preprocessing. Covers peak detection, retention time alignment, correspondence (grouping), and gap filling. Use when processing raw LC-MS data into a feature table for untargeted metabolomics.
Best use case
bio-metabolomics-xcms-preprocessing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
XCMS3 workflow for LC-MS/MS metabolomics preprocessing. Covers peak detection, retention time alignment, correspondence (grouping), and gap filling. Use when processing raw LC-MS data into a feature table for untargeted metabolomics.
Teams using bio-metabolomics-xcms-preprocessing should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/bio-metabolomics-xcms-preprocessing/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How bio-metabolomics-xcms-preprocessing Compares
| Feature / Agent | bio-metabolomics-xcms-preprocessing | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
XCMS3 workflow for LC-MS/MS metabolomics preprocessing. Covers peak detection, retention time alignment, correspondence (grouping), and gap filling. Use when processing raw LC-MS data into a feature table for untargeted metabolomics.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
## Version Compatibility
Reference examples tested with: MSnbase 2.28+, scanpy 1.10+, xcms 4.0+
Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# XCMS Metabolomics Preprocessing
Requires Bioconductor 3.18+ with xcms 4.0+ and MSnbase 2.28+.
## Load Raw Data
**Goal:** Import raw LC-MS files into R for downstream peak detection and alignment.
**Approach:** Read mzML/mzXML files into an OnDiskMSnExp object using MSnbase for memory-efficient access.
**"Process my raw LC-MS data into a feature table"** → Detect chromatographic peaks, align retention times across samples, group corresponding peaks, and fill missing values to produce a sample-by-feature intensity matrix.
```r
library(xcms)
library(MSnbase)
# Read mzML/mzXML files
raw_files <- list.files('raw_data', pattern = '\\.(mzML|mzXML)$', full.names = TRUE)
# Create OnDiskMSnExp object
raw_data <- readMSData(raw_files, mode = 'onDisk')
# Check data
raw_data
table(msLevel(raw_data))
```
## Define Sample Groups
**Goal:** Attach sample metadata (group labels, injection order) to the raw data object.
**Approach:** Create a data frame of sample information and assign it to the phenoData slot.
```r
# Sample metadata
sample_info <- data.frame(
sample_name = basename(raw_files),
sample_group = c(rep('Control', 5), rep('Treatment', 5), rep('QC', 3)),
injection_order = 1:length(raw_files)
)
# Assign to phenoData
pData(raw_data) <- sample_info
```
## Peak Detection (Centroided)
**Goal:** Identify chromatographic peaks in centroided LC-MS data.
**Approach:** Use the CentWave algorithm which detects peaks by continuous wavelet transform on regions of interest defined by m/z and RT.
```r
# CentWave algorithm for centroided data
cwp <- CentWaveParam(
peakwidth = c(5, 30), # Peak width range in seconds
ppm = 15, # m/z tolerance
snthresh = 10, # Signal-to-noise threshold
prefilter = c(3, 1000), # Min peaks and intensity
mzdiff = 0.01, # Minimum m/z difference
noise = 1000, # Noise level
integrate = 1 # Integration method
)
# Run peak detection
xdata <- findChromPeaks(raw_data, param = cwp)
# Summary
head(chromPeaks(xdata))
cat('Peaks found:', nrow(chromPeaks(xdata)), '\n')
```
## Peak Detection (Profile Data)
**Goal:** Detect peaks in profile (non-centroided) LC-MS data.
**Approach:** Use the MatchedFilter algorithm designed for continuum data, which convolves with a Gaussian model peak.
```r
# MatchedFilter for profile/continuum data
mfp <- MatchedFilterParam(
binSize = 0.1,
fwhm = 30,
snthresh = 10,
step = 0.1,
mzdiff = 0.8
)
xdata_profile <- findChromPeaks(raw_data, param = mfp)
```
## Retention Time Alignment
**Goal:** Correct retention time drift across samples to enable peak correspondence.
**Approach:** Apply Obiwarp alignment which uses dynamic time warping on the TIC profiles to compute sample-wise RT corrections.
```r
# Obiwarp alignment (recommended)
obp <- ObiwarpParam(
binSize = 0.5,
response = 1,
distFun = 'cor_opt',
gapInit = 0.3,
gapExtend = 2.4
)
xdata <- adjustRtime(xdata, param = obp)
# Check alignment
plotAdjustedRtime(xdata)
```
## Peak Correspondence (Grouping)
**Goal:** Group corresponding chromatographic peaks across samples into consensus features.
**Approach:** Use peak density-based grouping which models the RT distribution of peaks in m/z slices to identify features present across samples.
```r
# Group peaks across samples
pdp <- PeakDensityParam(
sampleGroups = pData(xdata)$sample_group,
bw = 5, # RT bandwidth
minFraction = 0.5, # Min fraction of samples
minSamples = 1, # Min samples per group
binSize = 0.025 # m/z bin size
)
xdata <- groupChromPeaks(xdata, param = pdp)
# Check feature definitions
featureDefinitions(xdata)
cat('Features:', nrow(featureDefinitions(xdata)), '\n')
```
## Gap Filling
**Goal:** Recover signal for features that were missed during initial peak detection in some samples.
**Approach:** Integrate intensity in the expected m/z-RT region for features with missing values using ChromPeakAreaParam.
```r
# Fill in missing peaks
fpp <- ChromPeakAreaParam()
xdata <- fillChromPeaks(xdata, param = fpp)
# Alternative: FillChromPeaksParam for more control
fpp2 <- FillChromPeaksParam(
expandMz = 0,
expandRt = 0,
ppm = 0
)
```
## Extract Feature Table
**Goal:** Generate a samples-by-features intensity matrix with m/z and RT annotations for downstream analysis.
**Approach:** Extract feature values and definitions from the processed XCMSnExp object and combine into an exportable table.
```r
# Get feature values (intensity matrix)
feature_values <- featureValues(xdata, method = 'maxint', value = 'into')
# Feature definitions (m/z, RT)
feature_defs <- featureDefinitions(xdata)
feature_defs <- as.data.frame(feature_defs)
feature_defs$feature_id <- rownames(feature_defs)
# Combine
feature_table <- cbind(feature_defs[, c('feature_id', 'mzmed', 'rtmed')], feature_values)
rownames(feature_table) <- feature_table$feature_id
# Save
write.csv(feature_table, 'feature_table.csv', row.names = FALSE)
```
## Quality Control
**Goal:** Assess preprocessing quality through TIC plots, peak counts, RT correction, and PCA.
**Approach:** Visualize total ion chromatograms, per-sample peak counts, RT adjustment, and PCA of the feature matrix.
```r
# TIC for each sample
tic <- chromatogram(raw_data, aggregationFun = 'sum')
plot(tic)
# Peak count per sample
peak_counts <- table(chromPeaks(xdata)[, 'sample'])
barplot(peak_counts, main = 'Peaks per sample')
# Check RT correction
par(mfrow = c(1, 2))
plotAdjustedRtime(xdata, col = pData(xdata)$sample_group)
# PCA of features
library(pcaMethods)
log_values <- log2(feature_values + 1)
log_values[is.na(log_values)] <- 0
pca <- pca(t(log_values), nPcs = 3, method = 'ppca')
plotPcs(pca, col = as.factor(pData(xdata)$sample_group))
```
## CAMERA Annotation (Isotopes/Adducts)
**Goal:** Identify isotope patterns and adduct groups among detected peaks to reduce feature redundancy.
**Approach:** Use CAMERA to group peaks by RT correlation, assign isotope clusters, and annotate adduct types.
```r
library(CAMERA)
# Create CAMERA object
xsa <- xsAnnotate(as(xdata, 'xcmsSet'))
# Group by RT
xsa <- groupFWHM(xsa, perfwhm = 0.6)
# Find isotopes
xsa <- findIsotopes(xsa, mzabs = 0.01, ppm = 10)
# Find adducts
xsa <- findAdducts(xsa, polarity = 'positive')
# Get annotated peak list
camera_results <- getPeaklist(xsa)
```
## Export for MetaboAnalyst
**Goal:** Format the XCMS feature table for import into MetaboAnalyst web or R package.
**Approach:** Transpose the matrix, create M/Z-RT feature names, and prepend sample group information.
```r
# Format for MetaboAnalyst web or R package
export_data <- t(feature_values)
colnames(export_data) <- paste0('M', round(feature_defs$mzmed, 4), 'T', round(feature_defs$rtmed, 1))
# Add sample info
export_df <- data.frame(Sample = rownames(export_data), Group = pData(xdata)$sample_group, export_data)
write.csv(export_df, 'metaboanalyst_input.csv', row.names = FALSE)
```
## Related Skills
- metabolite-annotation - Identify metabolites
- normalization-qc - Normalize feature table
- statistical-analysis - Differential analysisRelated Skills
tooluniverse-metabolomics
Comprehensive metabolomics research skill for identifying metabolites, analyzing studies, and searching metabolomics databases. Integrates HMDB (220k+ metabolites), MetaboLights, Metabolomics Workbench, and PubChem. Use when asked to identify or annotate metabolites (HMDB IDs, chemical properties, pathways), retrieve metabolomics study information from MetaboLights (MTBLS*) or Metabolomics Workbench (ST*), search for studies by keywords or disease, or generate comprehensive metabolomics research reports.
tooluniverse-metabolomics-analysis
Analyze metabolomics data including metabolite identification, quantification, pathway analysis, and metabolic flux. Processes LC-MS, GC-MS, NMR data from targeted and untargeted experiments. Performs normalization, statistical analysis, pathway enrichment, metabolite-enzyme integration, and biomarker discovery. Use when analyzing metabolomics datasets, identifying differential metabolites, studying metabolic pathways, integrating with transcriptomics/proteomics, discovering metabolic biomarkers, performing flux balance analysis, or characterizing metabolic phenotypes in disease, drug response, or physiological conditions.
tcga-bulk-data-preprocessing-with-omicverse
Guide Claude through ingesting TCGA sample sheets, expression archives, and clinical carts into omicverse, initialising survival metadata, and exporting annotated AnnData files.
single-cell-preprocessing-with-omicverse
Walk through omicverse's single-cell preprocessing tutorials to QC PBMC3k data, normalise counts, detect HVGs, and run PCA/embedding pipelines on CPU, CPU–GPU mixed, or GPU stacks.
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-spatial-transcriptomics-spatial-preprocessing
Quality control, filtering, normalization, and feature selection for spatial transcriptomics data. Calculate QC metrics, filter spots/cells, normalize counts, and identify highly variable genes. Use when filtering and normalizing spatial transcriptomics data.
bio-single-cell-preprocessing
Quality control, filtering, and normalization for single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for calculating QC metrics, filtering cells and genes, normalizing counts, identifying highly variable genes, and scaling data. Use when filtering, normalizing, and selecting features in single-cell data.
bio-ribo-seq-riboseq-preprocessing
Preprocess ribosome profiling data including adapter trimming, size selection, rRNA removal, and alignment. Use when preparing Ribo-seq reads for downstream analysis of translation.
bio-metabolomics-targeted-analysis
Targeted metabolomics analysis using MRM/SRM with standard curves. Covers absolute quantification, method validation, and quality assessment. Use when quantifying specific metabolites using calibration curves and internal standards.
bio-metabolomics-statistical-analysis
Statistical analysis for metabolomics data. Covers univariate testing, multivariate methods (PCA, PLS-DA), and biomarker discovery. Use when identifying differentially abundant metabolites or building classification models.
bio-metabolomics-pathway-mapping
Map metabolites to biological pathways using KEGG, Reactome, and MetaboAnalyst. Perform pathway enrichment and topology analysis. Use when interpreting metabolomics results in the context of biochemical pathways.
bio-metabolomics-normalization-qc
Quality control and normalization for metabolomics data. Covers QC-based correction, batch effect removal, and data transformation methods. Use when correcting technical variation in metabolomics data before statistical analysis.