bio-multi-omics-mofa-integration

Multi-Omics Factor Analysis (MOFA2) for unsupervised integration of multiple data modalities. Identifies shared and view-specific sources of variation. Use when integrating RNA-seq, proteomics, methylation, or other omics to discover latent factors driving biological variation across modalities.

1,802 stars

Best use case

bio-multi-omics-mofa-integration is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Multi-Omics Factor Analysis (MOFA2) for unsupervised integration of multiple data modalities. Identifies shared and view-specific sources of variation. Use when integrating RNA-seq, proteomics, methylation, or other omics to discover latent factors driving biological variation across modalities.

Teams using bio-multi-omics-mofa-integration should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-multi-omics-mofa-integration/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-multi-omics-mofa-integration/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/bio-multi-omics-mofa-integration/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How bio-multi-omics-mofa-integration Compares

Feature / Agentbio-multi-omics-mofa-integrationStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Multi-Omics Factor Analysis (MOFA2) for unsupervised integration of multiple data modalities. Identifies shared and view-specific sources of variation. Use when integrating RNA-seq, proteomics, methylation, or other omics to discover latent factors driving biological variation across modalities.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: scanpy 1.10+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# MOFA2 Integration

**"Find shared variation across my omics layers"** → Discover latent factors that capture shared and modality-specific sources of biological variation in an unsupervised manner.
- R: `MOFA2::create_mofa()` → `prepare_mofa()` → `run_mofa()`
- Python: `mofapy2` for training, `muon` for downstream

## Prepare Multi-Omics Data

**Goal:** Load and align multiple omics matrices into a consistent format for MOFA2 input.

**Approach:** Read each omics layer, intersect to common samples, transpose to features-by-samples orientation.

```r
library(MOFA2)
library(MultiAssayExperiment)

# Load individual omics matrices (samples x features)
rna <- as.matrix(read.csv('rnaseq_matrix.csv', row.names = 1))
protein <- as.matrix(read.csv('proteomics_matrix.csv', row.names = 1))
methylation <- as.matrix(read.csv('methylation_matrix.csv', row.names = 1))

# Ensure consistent sample names across views
common_samples <- Reduce(intersect, list(rownames(rna), rownames(protein), rownames(methylation)))
rna <- rna[common_samples, ]
protein <- protein[common_samples, ]
methylation <- methylation[common_samples, ]

# Transpose to features x samples (MOFA format)
data_list <- list(
    RNA = t(rna),
    Protein = t(protein),
    Methylation = t(methylation)
)
```

## Create and Train MOFA Model

**Goal:** Configure and train a MOFA2 model to discover shared and view-specific latent factors.

**Approach:** Set model and training options, then run variational inference to learn factor decomposition.

```r
# Create MOFA object
mofa <- create_mofa(data_list)

# View data overview
plot_data_overview(mofa)

# Set model options
model_opts <- get_default_model_options(mofa)
model_opts$num_factors <- 15  # Number of factors to learn

# Set training options
train_opts <- get_default_training_options(mofa)
train_opts$convergence_mode <- 'slow'
train_opts$seed <- 42

# Prepare and train
mofa <- prepare_mofa(mofa, model_options = model_opts, training_options = train_opts)
mofa <- run_mofa(mofa, outfile = 'mofa_model.hdf5')
```

## Analyze Factors

**Goal:** Quantify how much variance each factor explains per omics view and extract factor scores and loadings.

**Approach:** Plot variance decomposition and retrieve factor values (sample scores) and weights (feature loadings) as data frames.

```r
# Variance explained per factor per view
plot_variance_explained(mofa, max_r2 = 15)
plot_variance_explained(mofa, plot_total = TRUE)[[2]]

# Factor values (sample scores)
factors <- get_factors(mofa, as.data.frame = TRUE)

# Factor weights (feature loadings)
weights <- get_weights(mofa, as.data.frame = TRUE)
```

## Visualize Results

**Goal:** Generate publication-quality plots of factor values, feature weights, and factor correlations.

**Approach:** Use MOFA2 built-in plotting functions for scatter plots, heatmaps, and correlation matrices.

```r
# Scatter plot of factor values
plot_factor(mofa, factor = 1, color_by = 'Group')

# Heatmap of factor weights
plot_weights(mofa, view = 'RNA', factor = 1, nfeatures = 20)

# Top features per factor
plot_top_weights(mofa, view = 'RNA', factor = 1, nfeatures = 10)

# Correlation between factors
plot_factor_cor(mofa)
```

## Factor Interpretation

**Goal:** Identify biological pathways and gene sets associated with each MOFA factor.

**Approach:** Extract top-weighted features per factor and run gene set enrichment on factor weights.

```r
# Extract top features for pathway analysis
top_rna_factor1 <- get_weights(mofa, views = 'RNA', factors = 1, as.data.frame = TRUE)
top_rna_factor1 <- top_rna_factor1[order(abs(top_rna_factor1$value), decreasing = TRUE), ]
gene_list <- head(top_rna_factor1$feature, 100)

# Gene set enrichment on factor weights
library(MOFA2)
enrichment <- run_enrichment(mofa, feature.sets = msigdb_genesets,
                              view = 'RNA', factors = 1:5)
plot_enrichment(enrichment, factor = 1, max.pathways = 15)
```

## Add Sample Metadata

**Goal:** Annotate MOFA factors with clinical or experimental metadata for colored visualizations.

**Approach:** Load sample annotations and attach to the MOFA object, then plot factors colored by metadata variables.

```r
# Load sample annotations
metadata <- read.csv('sample_metadata.csv', row.names = 1)

# Add to MOFA object
samples_metadata(mofa) <- metadata[samples_names(mofa)[[1]], ]

# Color by metadata
plot_factor(mofa, factor = 1, color_by = 'Condition')
plot_factors(mofa, factors = 1:3, color_by = 'Condition')
```

## Multi-Group MOFA

**Goal:** Apply MOFA to batch- or group-structured multi-omics data and compare factor activity across groups.

**Approach:** Organize data as nested list by group, train multi-group MOFA, and visualize group-specific factor patterns.

```r
# For batch/group-structured data
data_list_grouped <- list(
    group1 = list(RNA = rna_g1, Protein = prot_g1),
    group2 = list(RNA = rna_g2, Protein = prot_g2)
)

mofa_grouped <- create_mofa(data_list_grouped)
mofa_grouped <- prepare_mofa(mofa_grouped)
mofa_grouped <- run_mofa(mofa_grouped)

# Compare factor activity across groups
plot_factor(mofa_grouped, factor = 1, group_by = 'group', color_by = 'group')
```

## MOFA+ for Single-Cell

**Goal:** Apply MOFA to single-cell multi-modal data (CITE-seq, Multiome) with stochastic inference for scalability.

**Approach:** Extract modality matrices from a Seurat object, create MOFA with stochastic training for large cell counts.

```r
# For single-cell multi-omics (CITE-seq, Multiome)
library(Seurat)

# Extract modalities from Seurat object
rna_mat <- GetAssayData(seurat_obj, assay = 'RNA', layer = 'data')
adt_mat <- GetAssayData(seurat_obj, assay = 'ADT', layer = 'data')

# Create MOFA with single-cell settings
mofa_sc <- create_mofa(list(RNA = rna_mat, ADT = adt_mat))
model_opts <- get_default_model_options(mofa_sc)
model_opts$num_factors <- 10

# Use stochastic inference for large datasets
train_opts <- get_default_training_options(mofa_sc)
train_opts$stochastic <- TRUE
```

## Export Results

**Goal:** Save MOFA factor scores, feature weights, and variance explained to CSV for downstream use.

**Approach:** Extract each result type as a data frame and write to disk.

```r
# Save factor values
factors_df <- get_factors(mofa, as.data.frame = TRUE)
write.csv(factors_df, 'mofa_factors.csv', row.names = FALSE)

# Save weights
weights_df <- get_weights(mofa, as.data.frame = TRUE)
write.csv(weights_df, 'mofa_weights.csv', row.names = FALSE)

# Save variance explained
var_exp <- get_variance_explained(mofa)
write.csv(var_exp$r2_per_factor, 'mofa_variance_explained.csv')
```

## Related Skills

- mixomics-analysis - Supervised multi-omics integration
- data-harmonization - Preprocess data before MOFA
- pathway-analysis/go-enrichment - Interpret MOFA factors
- single-cell/multimodal-integration - Single-cell multi-omics

Related Skills

tooluniverse-spatial-transcriptomics

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Analyze spatial transcriptomics data to map gene expression in tissue architecture. Supports 10x Visium, MERFISH, seqFISH, Slide-seq, and imaging-based platforms. Performs spatial clustering, domain identification, cell-cell proximity analysis, spatial gene expression patterns, tissue architecture mapping, and integration with single-cell data. Use when analyzing spatial transcriptomics datasets, studying tissue organization, identifying spatial expression patterns, mapping cell-cell interactions in tissue context, characterizing tumor microenvironment spatial structure, or integrating spatial and single-cell RNA-seq data for comprehensive tissue analysis.

tooluniverse-spatial-omics-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Computational analysis framework for spatial multi-omics data integration. Given spatially variable genes (SVGs), spatial domain annotations, tissue type, and disease context from spatial transcriptomics/proteomics experiments (10x Visium, MERFISH, DBiTplus, SLIDE-seq, etc.), performs comprehensive biological interpretation including pathway enrichment, cell-cell interaction inference, druggable target identification, immune microenvironment characterization, and multi-modal integration. Produces a detailed markdown report with Spatial Omics Integration Score (0-100), domain-by-domain characterization, and validation recommendations. Uses 70+ ToolUniverse tools across 9 analysis phases. Use when users ask about spatial transcriptomics analysis, spatial omics interpretation, tissue heterogeneity, spatial gene expression patterns, tumor microenvironment mapping, tissue zonation, or cell-cell communication from spatial data.

tooluniverse-proteomics-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Analyze mass spectrometry proteomics data including protein quantification, differential expression, post-translational modifications (PTMs), and protein-protein interactions. Processes MaxQuant, Spectronaut, DIA-NN, and other MS platform outputs. Performs normalization, statistical analysis, pathway enrichment, and integration with transcriptomics. Use when analyzing proteomics data, comparing protein abundance between conditions, identifying PTM changes, studying protein complexes, integrating protein and RNA data, discovering protein biomarkers, or conducting quantitative proteomics experiments.

tooluniverse-multiomic-disease-characterization

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive multi-omics disease characterization integrating genomics, transcriptomics, proteomics, pathway, and therapeutic layers for systems-level understanding. Produces a detailed multi-omics report with quantitative confidence scoring (0-100), cross-layer gene concordance analysis, biomarker candidates, therapeutic opportunities, and mechanistic hypotheses. Uses 80+ ToolUniverse tools across 8 analysis layers. Use when users ask about disease mechanisms, multi-omics analysis, systems biology of disease, biomarker discovery, or therapeutic target identification from a disease perspective.

tooluniverse-multi-omics-integration

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Integrate and analyze multiple omics datasets (transcriptomics, proteomics, epigenomics, genomics, metabolomics) for systems biology and precision medicine. Performs cross-omics correlation, multi-omics clustering (MOFA+, NMF), pathway-level integration, and sample matching. Coordinates ToolUniverse skills for expression data (RNA-seq), epigenomics (methylation, ChIP-seq), variants (SNVs, CNVs), protein interactions, and pathway enrichment. Use when analyzing multi-omics datasets, performing integrative analysis, discovering multi-omics biomarkers, studying disease mechanisms across molecular layers, or conducting systems biology research that requires coordinated analysis of transcriptome, genome, epigenome, proteome, and metabolome data.

tooluniverse-metabolomics

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive metabolomics research skill for identifying metabolites, analyzing studies, and searching metabolomics databases. Integrates HMDB (220k+ metabolites), MetaboLights, Metabolomics Workbench, and PubChem. Use when asked to identify or annotate metabolites (HMDB IDs, chemical properties, pathways), retrieve metabolomics study information from MetaboLights (MTBLS*) or Metabolomics Workbench (ST*), search for studies by keywords or disease, or generate comprehensive metabolomics research reports.

tooluniverse-metabolomics-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Analyze metabolomics data including metabolite identification, quantification, pathway analysis, and metabolic flux. Processes LC-MS, GC-MS, NMR data from targeted and untargeted experiments. Performs normalization, statistical analysis, pathway enrichment, metabolite-enzyme integration, and biomarker discovery. Use when analyzing metabolomics datasets, identifying differential metabolites, studying metabolic pathways, integrating with transcriptomics/proteomics, discovering metabolic biomarkers, performing flux balance analysis, or characterizing metabolic phenotypes in disease, drug response, or physiological conditions.

tooluniverse-epigenomics

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Production-ready genomics and epigenomics data processing for BixBench questions. Handles methylation array analysis (CpG filtering, differential methylation, age-related CpG detection, chromosome-level density), ChIP-seq peak analysis (peak calling, motif enrichment, coverage stats), ATAC-seq chromatin accessibility, multi-omics integration (expression + methylation correlation), and genome-wide statistics. Pure Python computation (pandas, scipy, numpy, pysam, statsmodels) plus ToolUniverse annotation tools (Ensembl, ENCODE, SCREEN, JASPAR, ReMap, RegulomeDB, ChIPAtlas). Supports BED, BigWig, methylation beta-value matrices, Illumina manifest files, and multi-sample clinical data. Use when processing methylation data, ChIP-seq peaks, ATAC-seq signals, or answering questions about CpG sites, differential methylation, chromatin accessibility, histone marks, or epigenomic statistics.

spatial-transcriptomics-tutorials-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guide users through omicverse's spatial transcriptomics tutorials covering preprocessing, deconvolution, and downstream modelling workflows across Visium, Visium HD, Stereo-seq, and Slide-seq datasets.

single-cell-multi-omics-integration

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Quick-reference sheet for OmicVerse tutorials spanning MOFA, GLUE pairing, SIMBA integration, TOSICA transfer, and StaVIA cartography.

protocolsio-integration

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Integration with protocols.io API for managing scientific protocols. This skill should be used when working with protocols.io to search, create, update, or publish protocols; manage protocol steps and materials; handle discussions and comments; organize workspaces; upload and manage files; or integrate protocols.io functionality into workflows. Applicable for protocol discovery, collaborative protocol development, experiment tracking, lab protocol management, and scientific documentation.

opentrons-integration

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Lab automation platform for Flex/OT-2 robots. Write Protocol API v2 protocols, liquid handling, hardware modules (heater-shaker, thermocycler), labware management, for automated pipetting workflows.