tooluniverse-multi-omics-integration
Integrate and analyze multiple omics datasets (transcriptomics, proteomics, epigenomics, genomics, metabolomics) for systems biology and precision medicine. Performs cross-omics correlation, multi-omics clustering (MOFA+, NMF), pathway-level integration, and sample matching. Coordinates ToolUniverse skills for expression data (RNA-seq), epigenomics (methylation, ChIP-seq), variants (SNVs, CNVs), protein interactions, and pathway enrichment. Use when analyzing multi-omics datasets, performing integrative analysis, discovering multi-omics biomarkers, studying disease mechanisms across molecular layers, or conducting systems biology research that requires coordinated analysis of transcriptome, genome, epigenome, proteome, and metabolome data.
Best use case
tooluniverse-multi-omics-integration is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Integrate and analyze multiple omics datasets (transcriptomics, proteomics, epigenomics, genomics, metabolomics) for systems biology and precision medicine. Performs cross-omics correlation, multi-omics clustering (MOFA+, NMF), pathway-level integration, and sample matching. Coordinates ToolUniverse skills for expression data (RNA-seq), epigenomics (methylation, ChIP-seq), variants (SNVs, CNVs), protein interactions, and pathway enrichment. Use when analyzing multi-omics datasets, performing integrative analysis, discovering multi-omics biomarkers, studying disease mechanisms across molecular layers, or conducting systems biology research that requires coordinated analysis of transcriptome, genome, epigenome, proteome, and metabolome data.
Teams using tooluniverse-multi-omics-integration should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/tooluniverse-multi-omics-integration/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How tooluniverse-multi-omics-integration Compares
| Feature / Agent | tooluniverse-multi-omics-integration | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Integrate and analyze multiple omics datasets (transcriptomics, proteomics, epigenomics, genomics, metabolomics) for systems biology and precision medicine. Performs cross-omics correlation, multi-omics clustering (MOFA+, NMF), pathway-level integration, and sample matching. Coordinates ToolUniverse skills for expression data (RNA-seq), epigenomics (methylation, ChIP-seq), variants (SNVs, CNVs), protein interactions, and pathway enrichment. Use when analyzing multi-omics datasets, performing integrative analysis, discovering multi-omics biomarkers, studying disease mechanisms across molecular layers, or conducting systems biology research that requires coordinated analysis of transcriptome, genome, epigenome, proteome, and metabolome data.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Multi-Omics Integration
Coordinate and integrate multiple omics datasets for comprehensive systems biology analysis. This skill orchestrates specialized ToolUniverse skills to perform cross-omics correlation, multi-omics clustering, pathway-level integration, and unified interpretation across molecular layers.
## When to Use This Skill
**Triggers**:
- User has multiple omics datasets (RNA-seq + proteomics, methylation + expression, etc.)
- Requests for integrative multi-omics analysis
- Cross-omics correlation queries (e.g., "How does methylation affect expression?")
- Multi-omics biomarker discovery
- Systems biology questions requiring multiple molecular layers
- Precision medicine applications with multi-omics patient data
- Questions about molecular mechanisms across omics types
**Example Questions This Skill Solves**:
1. "Integrate RNA-seq and proteomics data to find genes with concordant changes"
2. "How does promoter methylation correlate with gene expression?"
3. "Perform multi-omics clustering to identify patient subtypes"
4. "Which pathways are dysregulated across transcriptome, proteome, and metabolome?"
5. "Find multi-omics biomarkers for disease classification"
6. "Correlate CNV with gene expression to identify dosage effects"
7. "Integrate GWAS variants, eQTLs, and expression data"
8. "Perform MOFA+ analysis on multi-omics cancer data"
---
## Core Capabilities
| Capability | Description |
|-----------|-------------|
| **Data Integration** | Match samples across omics, handle missing data, normalize scales |
| **Cross-Omics Correlation** | Correlate features across molecular layers (gene expression vs protein, methylation vs expression) |
| **Multi-Omics Clustering** | MOFA+, NMF, joint clustering to identify omics-driven subtypes |
| **Pathway Integration** | Combine omics evidence at pathway level for unified biological interpretation |
| **Biomarker Discovery** | Identify multi-omics signatures with improved predictive power |
| **Skill Coordination** | Orchestrate RNA-seq, epigenomics, variant-analysis, protein-interactions, gene-enrichment skills |
| **Visualization** | Circos plots, integrated heatmaps, network visualizations |
| **Reporting** | Unified multi-omics reports with cross-layer insights |
---
## Workflow Overview
```
Input: Multiple Omics Datasets
|
v
Phase 1: Data Loading & QC
|-- Load RNA-seq (expression matrix)
|-- Load proteomics (protein abundance)
|-- Load methylation (beta values or M-values)
|-- Load variants (CNV, SNV from VCF)
|-- Load metabolomics (metabolite abundance)
|-- Quality control per omics type
|
v
Phase 2: Sample Matching
|-- Match samples across omics by ID
|-- Identify common samples
|-- Handle batch effects
|-- Normalize sample identifiers
|
v
Phase 3: Feature Mapping
|-- Map features to common identifier space (genes, proteins, metabolites)
|-- Link CpG sites to genes (promoter, gene body)
|-- Map variants to genes
|-- Create unified feature matrix
|
v
Phase 4: Cross-Omics Correlation
|-- Gene expression vs protein abundance (translation efficiency)
|-- Promoter methylation vs expression (epigenetic regulation)
|-- CNV vs expression (dosage effect)
|-- eQTL variants vs expression (genetic regulation)
|-- Metabolite vs enzyme expression (metabolic flux)
|
v
Phase 5: Multi-Omics Clustering
|-- MOFA+ (Multi-Omics Factor Analysis) for latent factors
|-- NMF (Non-negative Matrix Factorization) for patient subtypes
|-- Joint clustering across omics
|-- Identify omics-specific vs shared variation
|
v
Phase 6: Pathway-Level Integration
|-- Aggregate omics to pathway level
|-- Score pathway dysregulation (combined evidence)
|-- Use ToolUniverse enrichment tools (Reactome, KEGG, GO)
|-- Identify driver pathways across omics
|
v
Phase 7: Biomarker Discovery
|-- Feature selection across omics
|-- Multi-omics signatures for classification
|-- Cross-validation and performance
|-- Interpretation and biological validation
|
v
Phase 8: Generate Integrated Report
|-- Summary statistics per omics
|-- Cross-omics correlation results
|-- Multi-omics clusters and subtypes
|-- Top dysregulated pathways
|-- Multi-omics biomarkers
|-- Biological interpretation
```
---
## Phase Details
### Phase 1: Data Loading & Quality Control
**Objective**: Load multiple omics datasets and perform quality control.
**Supported omics types**:
- **Transcriptomics**: RNA-seq count matrices, microarray
- **Proteomics**: Protein abundance (MS-based)
- **Epigenomics**: Methylation (450K, EPIC arrays, WGBS), ChIP-seq peaks
- **Genomics**: CNV, SNV, structural variants
- **Metabolomics**: Metabolite abundance (targeted, untargeted)
**Data formats**:
- Expression: CSV/TSV matrices, HDF5, AnnData (.h5ad)
- Proteomics: MaxQuant output, Spectronaut, DIA-NN
- Methylation: IDAT files, beta value matrices
- Variants: VCF, SEG files (CNV)
- Metabolomics: Peak tables, identified metabolites
**Quality control per omics**:
```python
# RNA-seq QC
- Filter low-count genes (mean counts < threshold)
- Normalize (TPM, FPKM, or DESeq2)
- Log-transform for correlation
# Proteomics QC
- Filter proteins with high missing values
- Impute missing values (minimum, KNN)
- Normalize (median, quantile)
# Methylation QC
- Remove failed probes
- Correct for batch effects (ComBat)
- Filter cross-reactive probes
# Variants QC
- Use variant-analysis skill for VCF QC
- CNV segmentation validation
```
### Phase 2: Sample Matching
**Objective**: Identify common samples across omics datasets.
**Sample ID harmonization**:
```python
def match_samples_across_omics(omics_data_dict):
"""
Match samples across multiple omics datasets.
Parameters:
omics_data_dict: {
'rnaseq': DataFrame (genes x samples),
'proteomics': DataFrame (proteins x samples),
'methylation': DataFrame (CpGs x samples),
'cnv': DataFrame (genes x samples)
}
Returns:
- common_samples: List of sample IDs present in all omics
- matched_data: Dict of DataFrames with common samples only
"""
# Extract sample IDs from each omics
sample_ids = {
omics_type: set(df.columns)
for omics_type, df in omics_data_dict.items()
}
# Find common samples (intersection)
common_samples = set.intersection(*sample_ids.values())
# Subset each omics to common samples
matched_data = {
omics_type: df[sorted(common_samples)]
for omics_type, df in omics_data_dict.items()
}
return sorted(common_samples), matched_data
```
**Handling missing omics**:
- Pairwise integration if not all samples have all omics
- Document sample availability matrix
### Phase 3: Feature Mapping
**Objective**: Map features from different omics to common gene-level identifiers.
**Gene-centric integration**:
```python
# Map all features to genes
feature_mapping = {
'rnaseq': 'gene_symbol', # Already gene-level
'proteomics': 'gene_symbol', # Map protein to gene
'methylation': 'gene_symbol', # Map CpG to gene (promoter)
'cnv': 'gene_symbol', # CNV regions to overlapping genes
'metabolomics': 'enzyme_gene' # Metabolite to enzyme gene
}
```
**CpG to gene mapping**:
- **Promoter methylation**: CpGs within TSS ± 2kb
- **Gene body methylation**: CpGs within gene boundaries
- Average methylation per gene (weighted by probe coverage)
**CNV to gene mapping**:
- Use variant-analysis skill to identify genes in CNV regions
- Calculate copy number per gene (log2 ratio)
### Phase 4: Cross-Omics Correlation
**Objective**: Correlate features across molecular layers to understand regulation.
**Example analyses**:
#### 4.1: Expression vs Protein (Translation Efficiency)
```python
def correlate_rna_protein(rnaseq_data, proteomics_data):
"""
Correlate mRNA and protein levels for each gene.
Expected: Positive correlation (r ~ 0.4-0.6 typical)
Discordance indicates post-transcriptional regulation
"""
# Find common genes
common_genes = set(rnaseq_data.index) & set(proteomics_data.index)
correlations = {}
for gene in common_genes:
rna = rnaseq_data.loc[gene]
protein = proteomics_data.loc[gene]
# Spearman correlation (robust to outliers)
r, p = spearmanr(rna, protein)
correlations[gene] = {'r': r, 'p': p}
# Identify discordant genes (low RNA-protein correlation)
discordant = {g: v for g, v in correlations.items() if abs(v['r']) < 0.2}
return correlations, discordant
```
#### 4.2: Methylation vs Expression (Epigenetic Regulation)
```python
def correlate_methylation_expression(methylation_data, rnaseq_data):
"""
Correlate promoter methylation with gene expression.
Expected: Negative correlation (increased methylation → decreased expression)
"""
# For each gene with promoter methylation
results = {}
for gene in methylation_data.index:
if gene in rnaseq_data.index:
meth = methylation_data.loc[gene] # Average promoter beta
expr = rnaseq_data.loc[gene]
r, p = spearmanr(meth, expr)
results[gene] = {'r': r, 'p': p, 'direction': 'repressive' if r < 0 else 'activating'}
# Identify genes with strong methylation-expression anticorrelation
regulated = {g: v for g, v in results.items() if v['r'] < -0.5 and v['p'] < 0.01}
return results, regulated
```
#### 4.3: CNV vs Expression (Dosage Effect)
```python
def correlate_cnv_expression(cnv_data, rnaseq_data):
"""
Correlate copy number with gene expression.
Expected: Positive correlation (gene dosage effect)
"""
results = {}
for gene in cnv_data.index:
if gene in rnaseq_data.index:
cnv = cnv_data.loc[gene] # log2 ratio
expr = rnaseq_data.loc[gene]
r, p = pearsonr(cnv, expr)
results[gene] = {'r': r, 'p': p}
# Genes with dosage effect (CNV drives expression)
dosage_genes = {g: v for g, v in results.items() if v['r'] > 0.5 and v['p'] < 0.01}
return results, dosage_genes
```
### Phase 5: Multi-Omics Clustering
**Objective**: Identify patient subtypes using integrated omics data.
**Method 1: MOFA+ (Multi-Omics Factor Analysis)**
MOFA+ identifies latent factors that explain variation across omics.
```python
# Conceptual workflow (uses R's MOFA2 package or Python implementation)
# 1. Prepare multi-omics data as list of matrices
# 2. Run MOFA+ to identify factors
# 3. Inspect factor variance explained per omics
# 4. Cluster samples based on factor scores
# Example interpretation:
# Factor 1: Explains 40% variance in RNA-seq, 30% in proteomics → Cell proliferation
# Factor 2: Explains 50% variance in methylation → Epigenetic subtype
# Factor 3: Explains 20% variance in CNV → Genomic instability
```
**Method 2: Joint NMF (Non-negative Matrix Factorization)**
Decompose multi-omics matrices into shared latent components.
```python
def joint_nmf_clustering(omics_data_dict, n_clusters=3):
"""
Perform joint NMF across omics for clustering.
Returns patient cluster assignments based on shared factors.
"""
# Concatenate omics matrices (after normalization)
combined_matrix = np.vstack([
omics_data_dict['rnaseq'].values,
omics_data_dict['proteomics'].values,
omics_data_dict['methylation'].values
])
# Run NMF
from sklearn.decomposition import NMF
model = NMF(n_components=n_clusters, init='nndsvd', random_state=42)
W = model.fit_transform(combined_matrix) # Feature loadings
H = model.components_ # Sample coefficients
# Cluster samples based on H (components)
from sklearn.cluster import KMeans
clusters = KMeans(n_clusters=n_clusters).fit_predict(H.T)
return clusters, W, H
```
**Method 3: Similarity Network Fusion (SNF)**
Integrate omics through patient similarity networks.
### Phase 6: Pathway-Level Integration
**Objective**: Aggregate multi-omics evidence at the pathway level.
**Approach**: Score pathway dysregulation using combined evidence from multiple omics.
```python
def integrate_pathway_evidence(omics_results, pathway_genes):
"""
Score pathway dysregulation across omics.
omics_results: {
'rnaseq': {'gene': fold_change},
'proteomics': {'gene': fold_change},
'methylation': {'gene': methylation_diff},
'cnv': {'gene': copy_number}
}
pathway_genes: List of genes in pathway
"""
# For each gene in pathway
pathway_scores = []
for gene in pathway_genes:
gene_score = 0
evidence_count = 0
# RNA-seq evidence
if gene in omics_results['rnaseq']:
gene_score += abs(omics_results['rnaseq'][gene])
evidence_count += 1
# Proteomics evidence
if gene in omics_results['proteomics']:
gene_score += abs(omics_results['proteomics'][gene])
evidence_count += 1
# Methylation evidence (negative correlation)
if gene in omics_results['methylation']:
gene_score += abs(omics_results['methylation'][gene])
evidence_count += 1
# CNV evidence
if gene in omics_results['cnv']:
gene_score += abs(omics_results['cnv'][gene])
evidence_count += 1
if evidence_count > 0:
pathway_scores.append(gene_score / evidence_count)
# Aggregate pathway score (mean of gene scores)
pathway_score = np.mean(pathway_scores) if pathway_scores else 0
return {
'pathway_score': pathway_score,
'n_genes_with_evidence': len(pathway_scores),
'n_omics_types': evidence_count
}
```
**Use ToolUniverse enrichment tools**:
```python
# Get pathways for gene set
from tooluniverse import ToolUniverse
tu = ToolUniverse()
# Enrichment for genes dysregulated in ANY omics
all_dysregulated_genes = set()
all_dysregulated_genes.update(rnaseq_degs)
all_dysregulated_genes.update(diff_proteins)
all_dysregulated_genes.update(methylation_dmgs)
# Run enrichment
enrichment = tu.run_one_function({
"name": "enrichr_enrich",
"arguments": {
"gene_list": ",".join(all_dysregulated_genes),
"library": "KEGG_2021_Human"
}
})
# Score each pathway with multi-omics evidence
for pathway in enrichment['data']['results']:
pathway_genes = pathway['genes']
pathway['multi_omics_score'] = integrate_pathway_evidence(
omics_results, pathway_genes
)
```
### Phase 7: Biomarker Discovery
**Objective**: Identify multi-omics signatures for disease classification.
**Feature selection across omics**:
```python
def select_multiomics_features(X_dict, y, n_features=50):
"""
Select top features across omics for classification.
X_dict: {
'rnaseq': DataFrame (samples x genes),
'proteomics': DataFrame (samples x proteins),
'methylation': DataFrame (samples x CpGs)
}
y: Target labels (disease vs control)
Returns: Selected features per omics
"""
from sklearn.feature_selection import SelectKBest, f_classif
selected_features = {}
for omics_type, X in X_dict.items():
selector = SelectKBest(f_classif, k=min(n_features, X.shape[1]))
selector.fit(X, y)
# Get selected feature names
selected_idx = selector.get_support()
selected_features[omics_type] = X.columns[selected_idx].tolist()
return selected_features
```
**Multi-omics classification**:
```python
def multiomics_classification(X_dict, y, selected_features):
"""
Train classifier using multi-omics features.
"""
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
# Concatenate selected features from each omics
X_combined = []
for omics_type, features in selected_features.items():
X_combined.append(X_dict[omics_type][features])
X_combined = pd.concat(X_combined, axis=1)
# Train classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
scores = cross_val_score(clf, X_combined, y, cv=5, scoring='roc_auc')
return {
'mean_auc': scores.mean(),
'std_auc': scores.std(),
'n_features': X_combined.shape[1],
'features_per_omics': {k: len(v) for k, v in selected_features.items()}
}
```
### Phase 8: Integrated Reporting
**Generate comprehensive multi-omics report**:
```markdown
# Multi-Omics Integration Report
## Dataset Summary
- **Omics Types**: RNA-seq, Proteomics, Methylation, CNV
- **Common Samples**: 45 patients (30 disease, 15 control)
- **Features**: 15,000 genes, 5,000 proteins, 450K CpGs, 20K CNV regions
## Cross-Omics Correlation
### RNA-Protein Correlation
- **Overall correlation**: r = 0.52 (expected: 0.4-0.6)
- **Highly correlated**: 3,245 genes (45%)
- **Discordant genes**: 890 genes (post-transcriptional regulation)
### Methylation-Expression
- **Promoter methylation**: Anticorrelation r = -0.41
- **Epigenetically regulated genes**: 1,256 genes (p < 0.01)
- **Example**: BRCA1 promoter hypermethylation → 3-fold reduced expression
### CNV-Expression Dosage Effect
- **Genes with dosage effect**: 445 genes (r > 0.5, p < 0.01)
- **Example**: MYC amplification (3 copies) → 2.8-fold increased expression
## Multi-Omics Clustering
### MOFA+ Analysis
- **Factor 1** (25% variance): Cell cycle genes (RNA + protein)
- **Factor 2** (18% variance): Immune signature (RNA + methylation)
- **Factor 3** (15% variance): Metabolic reprogramming (RNA + metabolites)
### Patient Subtypes
- **Subtype 1** (n=18): High proliferation, MYC amplification
- **Subtype 2** (n=15): Immune-enriched, hypomethylation
- **Subtype 3** (n=12): Metabolic dysregulation, mitochondrial dysfunction
## Pathway Integration
### Top Dysregulated Pathways (Multi-Omics Score)
1. **Cell Cycle** (score: 8.5) - RNA (↑), Protein (↑), CNV (amplification)
2. **Immune Response** (score: 7.2) - RNA (↑), Methylation (hypo)
3. **Glycolysis** (score: 6.8) - RNA (↑), Metabolites (↑)
## Multi-Omics Biomarkers
### Classification Performance
- **AUC**: 0.92 ± 0.04 (5-fold CV)
- **Features**: 50 total (20 RNA, 15 protein, 10 methylation, 5 CNV)
- **Top biomarkers**:
- MYC expression (RNA)
- CDK1 protein abundance
- BRCA1 promoter methylation
- TP53 CNV status
## Biological Interpretation
The multi-omics analysis reveals three distinct disease subtypes driven by different molecular mechanisms:
1. **Proliferative subtype**: Characterized by MYC amplification driving coordinated upregulation of cell cycle genes at both RNA and protein levels.
2. **Immune subtype**: Hypomethylation of immune genes leading to increased expression and T-cell infiltration.
3. **Metabolic subtype**: Shift from oxidative phosphorylation to glycolysis, with concordant changes in enzyme expression and metabolite levels.
These subtypes may respond differently to targeted therapies.
```
---
## ToolUniverse Skills Coordination
This skill orchestrates multiple specialized skills:
| Skill | Used For | Phase |
|-------|----------|-------|
| `tooluniverse-rnaseq-deseq2` | Load and analyze RNA-seq data | Phase 1, 4 |
| `tooluniverse-epigenomics` | Methylation analysis, ChIP-seq peaks | Phase 1, 4 |
| `tooluniverse-variant-analysis` | CNV and SNV processing | Phase 1, 3, 4 |
| `tooluniverse-protein-interactions` | Protein network context | Phase 6 |
| `tooluniverse-gene-enrichment` | Pathway enrichment | Phase 6 |
| `tooluniverse-expression-data-retrieval` | Public omics data retrieval | Phase 1 |
| `tooluniverse-target-research` | Gene/protein annotation | Phase 3, 8 |
---
## Example Use Cases
### Use Case 1: Cancer Multi-Omics
**Question**: "Integrate TCGA breast cancer RNA-seq, proteomics, methylation, and CNV data"
**Workflow**:
1. Load 4 omics types for 500 patients
2. Match samples (450 common across all omics)
3. Correlate RNA-protein (identify translation-regulated genes)
4. Correlate methylation-expression (find epigenetically silenced genes)
5. Correlate CNV-expression (identify dosage-sensitive genes)
6. Run MOFA+ to find latent factors
7. Identify 4 subtypes with distinct multi-omics profiles
8. Perform pathway enrichment per subtype
9. Select multi-omics biomarkers (AUC=0.94)
### Use Case 2: eQTL + Expression
**Question**: "How do GWAS variants affect gene expression through methylation?"
**Workflow**:
1. Load genotype data (SNPs from GWAS)
2. Load expression data (RNA-seq)
3. Load methylation data (450K array)
4. For each GWAS SNP:
- Test association with nearby gene expression (eQTL)
- Test association with nearby CpG methylation (meQTL)
- Test CpG-gene correlation
5. Identify SNP → methylation → expression regulatory chains
6. Annotate with ToolUniverse (GWAS traits, gene function)
### Use Case 3: Drug Response Multi-Omics
**Question**: "Predict drug response using multi-omics profiles"
**Workflow**:
1. Load baseline multi-omics (pre-treatment)
2. Load drug response data (IC50 or clinical response)
3. Correlate each omics with response
4. Select multi-omics features predictive of response
5. Train multi-omics classifier
6. Identify pathways associated with resistance/sensitivity
7. Use ToolUniverse drug-repurposing skill for alternative options
---
## Advanced Analysis Patterns
### Pattern 1: Omics-Driven Patient Stratification
For precision medicine applications where patient stratification is goal.
### Pattern 2: Multi-Omics Network Analysis
Build integrated networks combining PPI, co-expression, regulatory interactions.
### Pattern 3: Temporal Multi-Omics
Longitudinal multi-omics data (time-series or treatment response).
### Pattern 4: Spatial Multi-Omics
Spatial transcriptomics + proteomics for tissue architecture.
---
## Quantified Minimums
| Component | Requirement |
|-----------|-------------|
| Omics types | At least 2 omics datasets |
| Common samples | At least 10 samples across omics |
| Cross-correlation | Pearson/Spearman correlation computed |
| Clustering | At least one method (MOFA+, NMF, or SNF) |
| Pathway integration | Enrichment with multi-omics evidence scores |
| Report | Summary, correlations, clusters, pathways, biomarkers |
---
## Limitations
- **Sample size**: Multi-omics integration requires sufficient samples (n≥20 recommended)
- **Missing data**: Some patients may not have all omics types
- **Batch effects**: Different omics platforms/batches require careful normalization
- **Computational**: Large multi-omics datasets may require significant memory/compute
- **Interpretation**: Multi-omics results require domain expertise for biological validation
---
## References
**Methods**:
- MOFA+: https://doi.org/10.1186/s13059-020-02015-1
- Similarity Network Fusion: https://doi.org/10.1038/nmeth.2810
- Multi-omics review: https://doi.org/10.1038/s41576-019-0093-7
**ToolUniverse Skills**:
- See individual skill documentation for omics-specific methodsRelated Skills
tooluniverse
Router skill for ToolUniverse tasks. First checks if specialized tooluniverse skills (34+ skills covering disease/drug/target research, clinical decision support, genomics, epigenomics, chemical safety, systems biology, and more) can solve the problem, then falls back to general strategies for using 1400+ scientific tools. Covers tool discovery, multi-hop queries, comprehensive research workflows, disambiguation, evidence grading, and report generation. Use when users need to research any scientific topic, find biological data, or explore drug/target/disease relationships.
tooluniverse-variant-interpretation
Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.
tooluniverse-variant-analysis
Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.
tooluniverse-target-research
Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.
tooluniverse-systems-biology
Comprehensive systems biology and pathway analysis using multiple pathway databases (Reactome, KEGG, WikiPathways, Pathway Commons, BioModels). Performs pathway enrichment, protein-pathway mapping, keyword searches, and systems-level analysis. Use when analyzing gene sets, exploring biological pathways, or investigating systems-level biology.
tooluniverse-structural-variant-analysis
Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.
tooluniverse-statistical-modeling
Perform statistical modeling and regression analysis on biomedical datasets. Supports linear regression, logistic regression (binary/ordinal/multinomial), mixed-effects models, Cox proportional hazards survival analysis, Kaplan-Meier estimation, and comprehensive model diagnostics. Extracts odds ratios, hazard ratios, confidence intervals, p-values, and effect sizes. Designed to solve BixBench statistical reasoning questions involving clinical/experimental data. Use when asked to fit regression models, compute odds ratios, perform survival analysis, run statistical tests, or interpret model coefficients from provided data.
tooluniverse-spatial-transcriptomics
Analyze spatial transcriptomics data to map gene expression in tissue architecture. Supports 10x Visium, MERFISH, seqFISH, Slide-seq, and imaging-based platforms. Performs spatial clustering, domain identification, cell-cell proximity analysis, spatial gene expression patterns, tissue architecture mapping, and integration with single-cell data. Use when analyzing spatial transcriptomics datasets, studying tissue organization, identifying spatial expression patterns, mapping cell-cell interactions in tissue context, characterizing tumor microenvironment spatial structure, or integrating spatial and single-cell RNA-seq data for comprehensive tissue analysis.
tooluniverse-spatial-omics-analysis
Computational analysis framework for spatial multi-omics data integration. Given spatially variable genes (SVGs), spatial domain annotations, tissue type, and disease context from spatial transcriptomics/proteomics experiments (10x Visium, MERFISH, DBiTplus, SLIDE-seq, etc.), performs comprehensive biological interpretation including pathway enrichment, cell-cell interaction inference, druggable target identification, immune microenvironment characterization, and multi-modal integration. Produces a detailed markdown report with Spatial Omics Integration Score (0-100), domain-by-domain characterization, and validation recommendations. Uses 70+ ToolUniverse tools across 9 analysis phases. Use when users ask about spatial transcriptomics analysis, spatial omics interpretation, tissue heterogeneity, spatial gene expression patterns, tumor microenvironment mapping, tissue zonation, or cell-cell communication from spatial data.
tooluniverse-single-cell
Production-ready single-cell and expression matrix analysis using scanpy, anndata, and scipy. Performs scRNA-seq QC, normalization, PCA, UMAP, Leiden/Louvain clustering, differential expression (Wilcoxon, t-test, DESeq2), cell type annotation, per-cell-type statistical analysis, gene-expression correlation, batch correction (Harmony), trajectory inference, and cell-cell communication analysis. NEW: Analyzes ligand-receptor interactions between cell types using OmniPath (CellPhoneDB, CellChatDB), scores communication strength, identifies signaling cascades, and handles multi-subunit receptor complexes. Integrates with ToolUniverse gene annotation tools (HPA, Ensembl, MyGene, UniProt) and enrichment tools (gseapy, PANTHER, STRING). Supports h5ad, 10X, CSV/TSV count matrices, and pre-annotated datasets. Use when analyzing single-cell RNA-seq data, studying cell-cell interactions, performing cell type differential expression, computing gene-expression correlations by cell type, analyzing tumor-immune communication, or answering questions about scRNA-seq datasets.
tooluniverse-sequence-retrieval
Retrieves biological sequences (DNA, RNA, protein) from NCBI and ENA with gene disambiguation, accession type handling, and comprehensive sequence profiles. Creates detailed reports with sequence metadata, cross-database references, and download options. Use when users need nucleotide sequences, protein sequences, genome data, or mention GenBank, RefSeq, EMBL accessions.
tooluniverse-sdk
Build AI scientist systems using ToolUniverse Python SDK for scientific research. Use when users need to access 1000++ scientific tools through Python code, create scientific workflows, perform drug discovery, protein analysis, genomics analysis, literature research, or any computational biology task. Triggers include requests to use scientific tools programmatically, build research pipelines, analyze biological data, search literature, predict drug properties, or create AI-powered scientific workflows.