bio-metabolomics-xcms-preprocessing

XCMS3 workflow for LC-MS/MS metabolomics preprocessing. Covers peak detection, retention time alignment, correspondence (grouping), and gap filling. Use when processing raw LC-MS data into a feature table for untargeted metabolomics.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

bio-metabolomics-xcms-preprocessing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bio-metabolomics-xcms-preprocessing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-metabolomics-xcms-preprocessing/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-metabolomics-xcms-preprocessing/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-metabolomics-xcms-preprocessing/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-metabolomics-xcms-preprocessing Compares

Feature / Agent	bio-metabolomics-xcms-preprocessing	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Top AI Agents for Productivity

See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.

SKILL.md Source

## Version Compatibility

Reference examples tested with: MSnbase 2.28+, scanpy 1.10+, xcms 4.0+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# XCMS Metabolomics Preprocessing

Requires Bioconductor 3.18+ with xcms 4.0+ and MSnbase 2.28+.

## Load Raw Data

**Goal:** Import raw LC-MS files into R for downstream peak detection and alignment.

**Approach:** Read mzML/mzXML files into an OnDiskMSnExp object using MSnbase for memory-efficient access.

**"Process my raw LC-MS data into a feature table"** → Detect chromatographic peaks, align retention times across samples, group corresponding peaks, and fill missing values to produce a sample-by-feature intensity matrix.

```r
library(xcms)
library(MSnbase)

# Read mzML/mzXML files
raw_files <- list.files('raw_data', pattern = '\\.(mzML|mzXML)$', full.names = TRUE)

# Create OnDiskMSnExp object
raw_data <- readMSData(raw_files, mode = 'onDisk')

# Check data
raw_data
table(msLevel(raw_data))
```

## Define Sample Groups

**Goal:** Attach sample metadata (group labels, injection order) to the raw data object.

**Approach:** Create a data frame of sample information and assign it to the phenoData slot.

```r
# Sample metadata
sample_info <- data.frame(
    sample_name = basename(raw_files),
    sample_group = c(rep('Control', 5), rep('Treatment', 5), rep('QC', 3)),
    injection_order = 1:length(raw_files)
)

# Assign to phenoData
pData(raw_data) <- sample_info
```

## Peak Detection (Centroided)

**Goal:** Identify chromatographic peaks in centroided LC-MS data.

**Approach:** Use the CentWave algorithm which detects peaks by continuous wavelet transform on regions of interest defined by m/z and RT.

```r
# CentWave algorithm for centroided data
cwp <- CentWaveParam(
    peakwidth = c(5, 30),       # Peak width range in seconds
    ppm = 15,                    # m/z tolerance
    snthresh = 10,               # Signal-to-noise threshold
    prefilter = c(3, 1000),      # Min peaks and intensity
    mzdiff = 0.01,               # Minimum m/z difference
    noise = 1000,                # Noise level
    integrate = 1                # Integration method
)

# Run peak detection
xdata <- findChromPeaks(raw_data, param = cwp)

# Summary
head(chromPeaks(xdata))
cat('Peaks found:', nrow(chromPeaks(xdata)), '\n')
```

## Peak Detection (Profile Data)

**Goal:** Detect peaks in profile (non-centroided) LC-MS data.

**Approach:** Use the MatchedFilter algorithm designed for continuum data, which convolves with a Gaussian model peak.

```r
# MatchedFilter for profile/continuum data
mfp <- MatchedFilterParam(
    binSize = 0.1,
    fwhm = 30,
    snthresh = 10,
    step = 0.1,
    mzdiff = 0.8
)

xdata_profile <- findChromPeaks(raw_data, param = mfp)
```

## Retention Time Alignment

**Goal:** Correct retention time drift across samples to enable peak correspondence.

**Approach:** Apply Obiwarp alignment which uses dynamic time warping on the TIC profiles to compute sample-wise RT corrections.

```r
# Obiwarp alignment (recommended)
obp <- ObiwarpParam(
    binSize = 0.5,
    response = 1,
    distFun = 'cor_opt',
    gapInit = 0.3,
    gapExtend = 2.4
)

xdata <- adjustRtime(xdata, param = obp)

# Check alignment
plotAdjustedRtime(xdata)
```

## Peak Correspondence (Grouping)

**Goal:** Group corresponding chromatographic peaks across samples into consensus features.

**Approach:** Use peak density-based grouping which models the RT distribution of peaks in m/z slices to identify features present across samples.

```r
# Group peaks across samples
pdp <- PeakDensityParam(
    sampleGroups = pData(xdata)$sample_group,
    bw = 5,                      # RT bandwidth
    minFraction = 0.5,           # Min fraction of samples
    minSamples = 1,              # Min samples per group
    binSize = 0.025              # m/z bin size
)

xdata <- groupChromPeaks(xdata, param = pdp)

# Check feature definitions
featureDefinitions(xdata)
cat('Features:', nrow(featureDefinitions(xdata)), '\n')
```

## Gap Filling

**Goal:** Recover signal for features that were missed during initial peak detection in some samples.

**Approach:** Integrate intensity in the expected m/z-RT region for features with missing values using ChromPeakAreaParam.

```r
# Fill in missing peaks
fpp <- ChromPeakAreaParam()
xdata <- fillChromPeaks(xdata, param = fpp)

# Alternative: FillChromPeaksParam for more control
fpp2 <- FillChromPeaksParam(
    expandMz = 0,
    expandRt = 0,
    ppm = 0
)
```

## Extract Feature Table

**Goal:** Generate a samples-by-features intensity matrix with m/z and RT annotations for downstream analysis.

**Approach:** Extract feature values and definitions from the processed XCMSnExp object and combine into an exportable table.

```r
# Get feature values (intensity matrix)
feature_values <- featureValues(xdata, method = 'maxint', value = 'into')

# Feature definitions (m/z, RT)
feature_defs <- featureDefinitions(xdata)
feature_defs <- as.data.frame(feature_defs)
feature_defs$feature_id <- rownames(feature_defs)

# Combine
feature_table <- cbind(feature_defs[, c('feature_id', 'mzmed', 'rtmed')], feature_values)
rownames(feature_table) <- feature_table$feature_id

# Save
write.csv(feature_table, 'feature_table.csv', row.names = FALSE)
```

## Quality Control

**Goal:** Assess preprocessing quality through TIC plots, peak counts, RT correction, and PCA.

**Approach:** Visualize total ion chromatograms, per-sample peak counts, RT adjustment, and PCA of the feature matrix.

```r
# TIC for each sample
tic <- chromatogram(raw_data, aggregationFun = 'sum')
plot(tic)

# Peak count per sample
peak_counts <- table(chromPeaks(xdata)[, 'sample'])
barplot(peak_counts, main = 'Peaks per sample')

# Check RT correction
par(mfrow = c(1, 2))
plotAdjustedRtime(xdata, col = pData(xdata)$sample_group)

# PCA of features
library(pcaMethods)
log_values <- log2(feature_values + 1)
log_values[is.na(log_values)] <- 0
pca <- pca(t(log_values), nPcs = 3, method = 'ppca')
plotPcs(pca, col = as.factor(pData(xdata)$sample_group))
```

## CAMERA Annotation (Isotopes/Adducts)

**Goal:** Identify isotope patterns and adduct groups among detected peaks to reduce feature redundancy.

**Approach:** Use CAMERA to group peaks by RT correlation, assign isotope clusters, and annotate adduct types.

```r
library(CAMERA)

# Create CAMERA object
xsa <- xsAnnotate(as(xdata, 'xcmsSet'))

# Group by RT
xsa <- groupFWHM(xsa, perfwhm = 0.6)

# Find isotopes
xsa <- findIsotopes(xsa, mzabs = 0.01, ppm = 10)

# Find adducts
xsa <- findAdducts(xsa, polarity = 'positive')

# Get annotated peak list
camera_results <- getPeaklist(xsa)
```

## Export for MetaboAnalyst

**Goal:** Format the XCMS feature table for import into MetaboAnalyst web or R package.

**Approach:** Transpose the matrix, create M/Z-RT feature names, and prepend sample group information.

```r
# Format for MetaboAnalyst web or R package
export_data <- t(feature_values)
colnames(export_data) <- paste0('M', round(feature_defs$mzmed, 4), 'T', round(feature_defs$rtmed, 1))

# Add sample info
export_df <- data.frame(Sample = rownames(export_data), Group = pData(xdata)$sample_group, export_data)

write.csv(export_df, 'metaboanalyst_input.csv', row.names = FALSE)
```

## Related Skills

- metabolite-annotation - Identify metabolites
- normalization-qc - Normalize feature table
- statistical-analysis - Differential analysis

Related Skills

tooluniverse-metabolomics

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive metabolomics research skill for identifying metabolites, analyzing studies, and searching metabolomics databases. Integrates HMDB (220k+ metabolites), MetaboLights, Metabolomics Workbench, and PubChem. Use when asked to identify or annotate metabolites (HMDB IDs, chemical properties, pathways), retrieve metabolomics study information from MetaboLights (MTBLS*) or Metabolomics Workbench (ST*), search for studies by keywords or disease, or generate comprehensive metabolomics research reports.

tooluniverse-metabolomics-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Analyze metabolomics data including metabolite identification, quantification, pathway analysis, and metabolic flux. Processes LC-MS, GC-MS, NMR data from targeted and untargeted experiments. Performs normalization, statistical analysis, pathway enrichment, metabolite-enzyme integration, and biomarker discovery. Use when analyzing metabolomics datasets, identifying differential metabolites, studying metabolic pathways, integrating with transcriptomics/proteomics, discovering metabolic biomarkers, performing flux balance analysis, or characterizing metabolic phenotypes in disease, drug response, or physiological conditions.

tcga-bulk-data-preprocessing-with-omicverse

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Guide Claude through ingesting TCGA sample sheets, expression archives, and clinical carts into omicverse, initialising survival metadata, and exporting annotated AnnData files.

single-cell-preprocessing-with-omicverse

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Walk through omicverse's single-cell preprocessing tutorials to QC PBMC3k data, normalise counts, detect HVGs, and run PCA/embedding pipelines on CPU, CPU–GPU mixed, or GPU stacks.

metabolomics-workbench-database

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

bio-spatial-transcriptomics-spatial-preprocessing

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Quality control, filtering, normalization, and feature selection for spatial transcriptomics data. Calculate QC metrics, filter spots/cells, normalize counts, and identify highly variable genes. Use when filtering and normalizing spatial transcriptomics data.

bio-single-cell-preprocessing

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Quality control, filtering, and normalization for single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for calculating QC metrics, filtering cells and genes, normalizing counts, identifying highly variable genes, and scaling data. Use when filtering, normalizing, and selecting features in single-cell data.

bio-ribo-seq-riboseq-preprocessing

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Preprocess ribosome profiling data including adapter trimming, size selection, rRNA removal, and alignment. Use when preparing Ribo-seq reads for downstream analysis of translation.

bio-metabolomics-targeted-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Targeted metabolomics analysis using MRM/SRM with standard curves. Covers absolute quantification, method validation, and quality assessment. Use when quantifying specific metabolites using calibration curves and internal standards.

bio-metabolomics-statistical-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Statistical analysis for metabolomics data. Covers univariate testing, multivariate methods (PCA, PLS-DA), and biomarker discovery. Use when identifying differentially abundant metabolites or building classification models.

bio-metabolomics-pathway-mapping

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Map metabolites to biological pathways using KEGG, Reactome, and MetaboAnalyst. Perform pathway enrichment and topology analysis. Use when interpreting metabolomics results in the context of biochemical pathways.

bio-metabolomics-normalization-qc

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Quality control and normalization for metabolomics data. Covers QC-based correction, batch effect removal, and data transformation methods. Use when correcting technical variation in metabolomics data before statistical analysis.