bio-causal-genomics-pleiotropy-detection

Detect and correct for horizontal pleiotropy in Mendelian randomization analyses using MR-PRESSO for outlier removal, MR-Egger regression for directional pleiotropy, and Steiger filtering for variant directionality. Use when validating MR results, detecting pleiotropic instruments, or running sensitivity analyses for causal inference.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

bio-causal-genomics-pleiotropy-detection is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bio-causal-genomics-pleiotropy-detection should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-causal-genomics-pleiotropy-detection/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-causal-genomics-pleiotropy-detection/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-causal-genomics-pleiotropy-detection/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-causal-genomics-pleiotropy-detection Compares

Feature / Agent	bio-causal-genomics-pleiotropy-detection	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: MR-PRESSO 1.0+, TwoSampleMR 0.5+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion("<pkg>")` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Pleiotropy Detection

**"Check my MR results for pleiotropic bias"** → Detect and correct for horizontal pleiotropy using outlier removal (MR-PRESSO), directional pleiotropy testing (MR-Egger intercept), and variant directionality filtering (Steiger) to validate causal inference results.
- R: `MRPRESSO::mr_presso()` for global and distortion tests
- R: `TwoSampleMR::mr_egger_regression()` for Egger intercept test

## Overview

Horizontal pleiotropy violates the exclusion restriction assumption of MR: instruments
affect the outcome through pathways other than the exposure. Detecting and correcting
for pleiotropy is essential for valid causal inference.

Types of pleiotropy:
- **Vertical** (mediated): Instrument -> exposure -> outcome (valid, not a problem)
- **Horizontal** (direct): Instrument -> outcome bypassing exposure (violates MR assumptions)
- **Balanced**: Pleiotropic effects cancel out (IVW still valid, Egger intercept ~0)
- **Directional**: Pleiotropic effects are systematic (biases IVW, Egger detects this)

## MR-PRESSO

**Goal:** Detect and remove pleiotropic outlier instruments from an MR analysis.

**Approach:** Run MR-PRESSO to test for global pleiotropy, identify individual outlier SNPs, test whether their removal changes the causal estimate (distortion test), and obtain a corrected estimate.

```r
# install.packages('remotes')
# remotes::install_github('rondolab/MR-PRESSO')
library(MRPRESSO)

# Input: harmonized data from TwoSampleMR
# Columns needed: beta.exposure, beta.outcome, se.exposure, se.outcome
presso_input <- data.frame(
  bx = dat$beta.exposure,
  by = dat$beta.outcome,
  bxse = dat$se.exposure,
  byse = dat$se.outcome
)

# --- Run MR-PRESSO ---
# NbDistribution: Number of simulations for null distribution (minimum 1000)
# SignifThreshold: P-value threshold for outlier detection (0.05 standard)
presso_result <- mr_presso(
  BetaOutcome = 'by', BetaExposure = 'bx',
  SdOutcome = 'byse', SdExposure = 'bxse',
  OUTLIERtest = TRUE, DISTORTIONtest = TRUE,
  data = presso_input,
  NbDistribution = 5000,
  SignifThreshold = 0.05
)

# --- Global test ---
# Tests whether there is any pleiotropy among instruments
# Significant p-value: Evidence of horizontal pleiotropy
global_p <- presso_result$`MR-PRESSO results`$`Global Test`$Pvalue
cat('Global test p-value:', global_p, '\n')

# --- Outlier test ---
# Identifies individual pleiotropic SNPs
outliers <- presso_result$`MR-PRESSO results`$`Outlier Test`
cat('\nOutlier test results:\n')
print(outliers)

# Outlier SNPs (p < 0.05)
outlier_indices <- which(outliers$Pvalue < 0.05)
cat('Outlier SNPs:', length(outlier_indices), '\n')

# --- Distortion test ---
# Tests whether removing outliers significantly changes the causal estimate
# Significant: Outliers were meaningfully biasing the estimate
distortion_p <- presso_result$`MR-PRESSO results`$`Distortion Test`$Pvalue
cat('Distortion test p-value:', distortion_p, '\n')

# --- Corrected estimate ---
# MR estimate after removing outlier SNPs
main_results <- presso_result$`Main MR results`
cat('\nRaw IVW estimate:', main_results$`Causal Estimate`[1], '\n')
cat('Corrected IVW estimate:', main_results$`Causal Estimate`[2], '\n')
```

## MR-Egger Diagnostics

**Goal:** Test for directional pleiotropy and obtain a pleiotropy-adjusted causal estimate.

**Approach:** Fit MR-Egger regression where the intercept estimates average pleiotropic bias, and check I-squared for instrument strength under the NOME assumption.

```r
library(TwoSampleMR)

# MR-Egger regression allows for a non-zero intercept
# The intercept estimates the average pleiotropic effect
egger <- mr_egger_regression(dat$beta.exposure, dat$beta.outcome,
                              dat$se.exposure, dat$se.outcome)

# --- Egger intercept ---
# Significant intercept (p < 0.05): Directional pleiotropy present
# Non-significant: No evidence (but low power with < 10 SNPs)
cat('Egger intercept:', round(egger$b_i, 5), '\n')
cat('Intercept SE:', round(egger$se_i, 5), '\n')
cat('Intercept p-value:', format.pval(egger$pval_i), '\n')

# --- Egger slope ---
# Valid causal estimate EVEN with directional pleiotropy (InSIDE assumption)
cat('\nEgger causal estimate:', round(egger$b, 4), '\n')
cat('Egger SE:', round(egger$se, 4), '\n')
cat('Egger p-value:', format.pval(egger$pval), '\n')

# --- I-squared for Egger ---
# I^2 measures instrument strength for MR-Egger specifically
# I^2 > 0.9: Egger estimate reliable
# I^2 < 0.6: Egger has low power, interpret with caution (NOME violation)
isq <- Isq(dat$beta.exposure, dat$se.exposure)
cat('\nI-squared:', round(isq, 3), '\n')
if (isq < 0.9) cat('Warning: I-squared < 0.9; Egger estimate may be unreliable (NOME violation)\n')
```

## Steiger Filtering

**Goal:** Verify that instruments act in the correct causal direction (exposure -> outcome, not reverse).

**Approach:** Apply the Steiger test to each instrument, remove those explaining more outcome variance than exposure variance, and re-run MR on filtered instruments.

```r
library(TwoSampleMR)

# Steiger test: Verify each instrument explains more variance in
# the exposure than the outcome. Instruments failing this test
# may act through a reverse causal pathway.

steiger <- steiger_filtering(dat)

# Keep only correctly oriented instruments
dat_steiger <- steiger[steiger$steiger_dir == TRUE, ]
cat('Instruments passing Steiger filter:', nrow(dat_steiger), 'of', nrow(steiger), '\n')

# Re-run MR with filtered instruments
results_steiger <- mr(dat_steiger)
print(results_steiger[, c('method', 'nsnp', 'b', 'se', 'pval')])

# Directionality test (aggregate)
direction <- directionality_test(dat)
cat('\nCorrect causal direction:', direction$correct_causal_direction, '\n')
cat('Steiger p-value:', format.pval(direction$steiger_pval), '\n')
```

## Additional Sensitivity Methods

```r
library(TwoSampleMR)

# --- Contamination mixture ---
# Assumes some instruments are valid, others are not
# Does not require majority valid assumption
mr_conmix <- mr(dat, method_list = 'mr_raps')

# --- MR-RAPS ---
# NOTE: MRAPS CRAN package was archived March 2025.
# Use the MendelianRandomization package instead, or install from GitHub:
# remotes::install_github('qingyuanzhao/mr.raps')
library(MendelianRandomization)

mr_input <- mr_input(
  bx = dat$beta.exposure, bxse = dat$se.exposure,
  by = dat$beta.outcome, byse = dat$se.outcome
)

raps_result <- mr_raps(mr_input)
cat('MR-RAPS estimate:', raps_result$Estimate, '\n')
cat('MR-RAPS p-value:', raps_result$Pvalue, '\n')

# --- Multivariable MR ---
# Controls for pleiotropy by including multiple exposures simultaneously
# e.g., adjust for BMI when estimating effect of lipids on CHD
exposure1 <- extract_instruments('ieu-a-300')  # LDL
exposure2 <- extract_instruments('ieu-a-302')  # HDL

# Combine and perform multivariable MR
# (See TwoSampleMR vignette for full multivariable workflow)
```

## Comprehensive Sensitivity Framework

**Goal:** Run a complete battery of MR sensitivity analyses to validate causal findings.

**Approach:** Apply IVW, MR-Egger, weighted median, weighted mode, heterogeneity, Egger intercept, leave-one-out, and MR-PRESSO in a single function and summarize results.

```r
library(TwoSampleMR)
library(MRPRESSO)

run_sensitivity <- function(dat) {
  results <- list()

  # 1. IVW (primary)
  results$ivw <- mr(dat, method_list = 'mr_ivw')

  # 2. MR-Egger
  results$egger <- mr(dat, method_list = 'mr_egger_regression')

  # 3. Weighted median (robust to 50% invalid instruments)
  results$median <- mr(dat, method_list = 'mr_weighted_median')

  # 4. Weighted mode
  results$mode <- mr(dat, method_list = 'mr_weighted_mode')

  # 5. Heterogeneity
  results$het <- mr_heterogeneity(dat)

  # 6. Egger intercept
  results$pleio <- mr_pleiotropy_test(dat)

  # 7. Leave-one-out
  results$loo <- mr_leaveoneout(dat)

  # 8. MR-PRESSO
  presso_input <- data.frame(
    bx = dat$beta.exposure, by = dat$beta.outcome,
    bxse = dat$se.exposure, byse = dat$se.outcome
  )
  results$presso <- mr_presso(
    BetaOutcome = 'by', BetaExposure = 'bx',
    SdOutcome = 'byse', SdExposure = 'bxse',
    OUTLIERtest = TRUE, DISTORTIONtest = TRUE,
    data = presso_input, NbDistribution = 5000, SignifThreshold = 0.05
  )

  results
}

summarize_sensitivity <- function(sens) {
  cat('=== MR Sensitivity Analysis Summary ===\n\n')

  # Method comparison
  all_mr <- rbind(sens$ivw, sens$egger, sens$median, sens$mode)
  cat('Method comparison:\n')
  print(all_mr[, c('method', 'b', 'se', 'pval')])

  # Heterogeneity
  cat('\nHeterogeneity (Cochran Q):\n')
  cat('  Q p-value (IVW):', sens$het$Q_pval[sens$het$method == 'Inverse variance weighted'], '\n')

  # Egger intercept
  cat('\nEgger intercept:\n')
  cat('  Intercept:', sens$pleio$egger_intercept, '\n')
  cat('  P-value:', sens$pleio$pval, '\n')

  # MR-PRESSO global test
  cat('\nMR-PRESSO global test p-value:',
      sens$presso$`MR-PRESSO results`$`Global Test`$Pvalue, '\n')

  cat('\n--- Interpretation ---\n')
  cat('Consistent estimates across methods: Evidence strengthened\n')
  cat('Significant Egger intercept: Directional pleiotropy present\n')
  cat('Significant MR-PRESSO global: Horizontal pleiotropy detected\n')
  cat('Significant heterogeneity: Instruments may be invalid\n')
}
```

## STROBE-MR Reporting

When reporting MR analyses, follow STROBE-MR guidelines:

1. Report all MR methods tested (not just the most significant)
2. Report heterogeneity Q-statistic and p-value
3. Report Egger intercept with p-value
4. Report MR-PRESSO global test and number of outliers removed
5. Report F-statistics for instrument strength
6. Report Steiger directionality test
7. State whether results are consistent across sensitivity analyses
8. Acknowledge limitations of the MR assumptions

## Related Skills

- mendelian-randomization - Primary MR analysis that pleiotropy tests validate
- fine-mapping - Identify causal variants at instrument loci
- population-genetics/association-testing - GWAS data for MR instruments

Related Skills

tooluniverse-epigenomics

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Production-ready genomics and epigenomics data processing for BixBench questions. Handles methylation array analysis (CpG filtering, differential methylation, age-related CpG detection, chromosome-level density), ChIP-seq peak analysis (peak calling, motif enrichment, coverage stats), ATAC-seq chromatin accessibility, multi-omics integration (expression + methylation correlation), and genome-wide statistics. Pure Python computation (pandas, scipy, numpy, pysam, statsmodels) plus ToolUniverse annotation tools (Ensembl, ENCODE, SCREEN, JASPAR, ReMap, RegulomeDB, ChIPAtlas). Supports BED, BigWig, methylation beta-value matrices, Illumina manifest files, and multi-sample clinical data. Use when processing methylation data, ChIP-seq peaks, ATAC-seq signals, or answering questions about CpG sites, differential methylation, chromatin accessibility, histone marks, or epigenomic statistics.

tooluniverse-adverse-event-detection

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Detect and analyze adverse drug event signals using FDA FAERS data, drug labels, disproportionality analysis (PRR, ROR, IC), and biomedical evidence. Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, adverse event investigation, and regulatory decision support.

crisis-detection-intervention-ai

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Detect crisis signals in user content using NLP, mental health sentiment analysis, and safe intervention protocols. Implements suicide ideation detection, automated escalation, and crisis resource integration. Use for mental health apps, recovery platforms, support communities. Activate on "crisis detection", "suicide prevention", "mental health NLP", "intervention protocol". NOT for general sentiment analysis, medical diagnosis, or replacing professional help.

claw-metagenomics

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Shotgun metagenomics profiling — taxonomy, resistome, and functional pathways

bio-single-cell-doublet-detection

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Detect and remove doublets (multiple cells captured in one droplet) from single-cell RNA-seq data. Uses Scrublet (Python), DoubletFinder (R), and scDblFinder (R). Essential QC step before clustering to avoid artificial cell populations. Use when identifying and removing doublets from scRNA-seq data.

bio-ribo-seq-orf-detection

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Detect and quantify translated ORFs from Ribo-seq data including uORFs and novel ORFs using RiboCode and ORFquant. Use when identifying translated regions beyond annotated coding sequences or quantifying ORF-level translation.

bio-methylation-dmr-detection

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Differentially methylated region (DMR) detection using methylKit tiles, bsseq BSmooth, and DMRcate. Use when identifying contiguous genomic regions with methylation differences between experimental conditions or cell types.

bio-methylation-based-detection

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Analyzes cfDNA methylation patterns for cancer detection using cfMeDIP-seq or bisulfite sequencing with MethylDackel. Identifies cancer-specific methylation signatures and performs tissue-of-origin deconvolution. Use when using methylation biomarkers for early cancer detection or minimal residual disease.

bio-metagenomics-visualization

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Visualize metagenomic profiles using R (phyloseq, microbiome) and Python (matplotlib, seaborn). Create stacked bar plots, heatmaps, PCA plots, and diversity analyses. Use when creating publication-quality figures from MetaPhlAn, Bracken, or other taxonomic profiling output.

bio-metagenomics-strain-tracking

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Track bacterial strains using MASH, sourmash, fastANI, and inStrain. Compare genomes, detect contamination, and monitor strain-level variation. Use when needing sub-species resolution for outbreak tracking, transmission analysis, or within-host strain dynamics.

bio-metagenomics-metaphlan

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Marker gene-based taxonomic profiling using MetaPhlAn 4. Provides accurate species-level relative abundances using clade-specific markers. Use when accurate taxonomic profiling is needed and computational resources are limited, or for comparison with HMP/other MetaPhlAn studies.

bio-metagenomics-kraken

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Taxonomic classification of metagenomic reads using Kraken2. Fast k-mer based classification against RefSeq database. Use when performing initial taxonomic classification of shotgun metagenomic reads before abundance estimation with Bracken.