tooluniverse-gwas-study-explorer

Compare GWAS studies, perform meta-analyses, and assess replication across cohorts. Integrates NHGRI-EBI GWAS Catalog and Open Targets Genetics to compare study designs, effect sizes, ancestry diversity, and heterogeneity statistics. Use when comparing GWAS studies for a trait, performing meta-analysis of genetic loci, assessing replication across cohorts, or exploring the genetic architecture of complex diseases.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

tooluniverse-gwas-study-explorer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using tooluniverse-gwas-study-explorer should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/tooluniverse-gwas-study-explorer/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/tooluniverse-gwas-study-explorer/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/tooluniverse-gwas-study-explorer/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How tooluniverse-gwas-study-explorer Compares

Feature / Agent	tooluniverse-gwas-study-explorer	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# GWAS Study Deep Dive & Meta-Analysis

**Compare GWAS studies, perform meta-analyses, and assess replication across cohorts**

---

## Overview

The GWAS Study Deep Dive & Meta-Analysis skill enables comprehensive comparison of genome-wide association studies (GWAS) for the same trait, meta-analysis of genetic loci across studies, and systematic assessment of replication and study quality. It integrates data from the NHGRI-EBI GWAS Catalog and Open Targets Genetics to provide a complete picture of the genetic architecture of complex traits.

### Key Capabilities

1. **Study Comparison**: Compare all GWAS studies for a trait, assessing sample sizes, ancestries, and platforms
2. **Meta-Analysis**: Aggregate effect sizes across studies and calculate heterogeneity statistics
3. **Replication Assessment**: Identify replicated vs novel findings across discovery and replication cohorts
4. **Quality Evaluation**: Assess statistical power, ancestry diversity, and data availability

---

## Use Cases

### 1. Comprehensive Trait Analysis
**Scenario**: "I want to understand all available GWAS data for type 2 diabetes"

**Workflow**:
- Search for all T2D studies in GWAS Catalog
- Filter by sample size and ancestry
- Extract top associations from each study
- Identify consistently replicated loci
- Assess ancestry-specific effects

**Outcome**: Complete landscape of T2D genetics with replicated findings and population-specific signals

### 2. Locus-Specific Meta-Analysis
**Scenario**: "Is the TCF7L2 association with T2D consistent across all studies?"

**Workflow**:
- Retrieve all TCF7L2 (rs7903146) associations for T2D
- Calculate combined effect size and p-value
- Assess heterogeneity (I² statistic)
- Generate forest plot data
- Interpret heterogeneity level

**Outcome**: Quantitative assessment of effect size consistency with heterogeneity interpretation

### 3. Replication Analysis
**Scenario**: "Which findings from the discovery cohort replicated in the independent sample?"

**Workflow**:
- Get top hits from discovery study
- Check for presence and significance in replication study
- Assess direction consistency
- Calculate replication rate
- Identify novel vs failed replication

**Outcome**: Systematic replication report with success rates and failed findings

### 4. Multi-Ancestry Comparison
**Scenario**: "Are T2D loci consistent across European and East Asian populations?"

**Workflow**:
- Filter studies by ancestry
- Compare top associations between populations
- Identify shared vs population-specific loci
- Assess allele frequency differences
- Evaluate transferability of genetic risk scores

**Outcome**: Ancestry-specific genetic architecture with transferability assessment

---

## Statistical Methods

### Meta-Analysis Approach

This skill implements standard GWAS meta-analysis methods:

**Fixed-Effects Model**:
- Used when heterogeneity is low (I² < 25%)
- Weights studies by inverse variance
- Assumes true effect size is the same across studies

**Random-Effects Model** (recommended when I² > 50%):
- Accounts for between-study variation
- More conservative than fixed-effects
- Better for diverse ancestries or methodologies

**Heterogeneity Assessment**:

The **I² statistic** measures the percentage of variance due to between-study heterogeneity:

```
I² = [(Q - df) / Q] × 100%

where Q = Cochran's Q statistic
df = degrees of freedom (n_studies - 1)
```

**Interpretation Guidelines**:
- **I² < 25%**: Low heterogeneity → fixed-effects appropriate
- **I² = 25-50%**: Moderate heterogeneity → investigate sources
- **I² = 50-75%**: Substantial heterogeneity → random-effects preferred
- **I² > 75%**: Considerable heterogeneity → meta-analysis may not be appropriate

### Sources of Heterogeneity

Common reasons for high I²:

1. **Ancestry differences**: Different allele frequencies and LD structure
2. **Phenotype heterogeneity**: Trait definition varies across studies
3. **Platform differences**: Imputation quality and coverage
4. **Winner's curse**: Discovery studies overestimate effect sizes
5. **Cohort characteristics**: Age, sex, environmental factors

**Recommendations**:
- Perform subgroup analysis by ancestry
- Use meta-regression to investigate sources
- Consider excluding outlier studies
- Apply genomic control correction

---

## Study Quality Assessment

### Quality Metrics

The skill evaluates studies based on:

**1. Sample Size**:
- Power to detect associations (80% power requires n > 10,000 for OR=1.2)
- Precision of effect size estimates
- Ability to detect modest effects

**2. Ancestry Diversity**:
- Single-ancestry vs multi-ancestry
- Population stratification control
- Transferability of findings

**3. Data Availability**:
- Summary statistics available for meta-analysis
- Individual-level data vs summary-level
- Imputation quality scores

**4. Genotyping Quality**:
- Platform density and coverage
- Imputation reference panel
- Quality control measures

**5. Statistical Rigor**:
- Genome-wide significance threshold (p < 5×10⁻⁸)
- Multiple testing correction
- Replication in independent cohort

### Quality Tiers

**Tier 1 (High Quality)**:
- n ≥ 50,000
- Summary statistics available
- Multi-ancestry or large single-ancestry
- Imputed to high-quality reference
- Independent replication

**Tier 2 (Moderate Quality)**:
- n ≥ 10,000
- Standard GWAS platform
- Adequate power for common variants
- Some data availability

**Tier 3 (Limited)**:
- n < 10,000
- Limited power
- May miss modest effects
- Use with caution

---

## Best Practices

### Before Meta-Analysis

1. **Check phenotype consistency**: Ensure studies measure the same trait
2. **Verify ancestry overlap**: High heterogeneity expected if ancestries differ
3. **Harmonize alleles**: Align effect alleles across studies
4. **Quality control**: Exclude low-quality studies or associations

### Interpreting Results

1. **Genome-wide significance**: p < 5×10⁻⁸ (Bonferroni for ~1M independent tests)
2. **Replication threshold**: p < 0.05 in independent cohort
3. **Direction consistency**: Effect should be same direction across studies
4. **Heterogeneity**: I² > 50% suggests caution in interpretation

### Common Pitfalls

❌ **Don't**:
- Meta-analyze without checking heterogeneity
- Ignore ancestry differences
- Over-interpret nominal p-values
- Assume replication failure means false positive

✅ **Do**:
- Always report I² statistic
- Perform sensitivity analyses
- Consider ancestry-stratified analysis
- Account for winner's curse in discovery studies

---

## Limitations & Caveats

### Data Limitations

1. **Incomplete Overlap**: Studies may analyze different SNPs
2. **Cohort Overlap**: Some cohorts participate in multiple studies (inflates significance)
3. **Publication Bias**: Significant findings more likely to be published
4. **Winner's Curse**: Discovery studies overestimate effect sizes
5. **Imputation Quality**: Varies across studies and populations

### Statistical Limitations

1. **Heterogeneity**: High I² may preclude meaningful meta-analysis
2. **Sample Size Differences**: Large studies dominate fixed-effects models
3. **Allele Frequency Differences**: Same variant has different effects across ancestries
4. **Linkage Disequilibrium**: Fine-mapping needed to identify causal variants
5. **Gene-Environment Interactions**: Not captured in standard meta-analysis

### Interpretation Guidelines

**When I² > 75%**:
- Meta-analysis results should be interpreted with extreme caution
- Investigate sources of heterogeneity systematically
- Consider ancestry-specific or subgroup analyses
- Descriptive comparison may be more appropriate than meta-analysis

**When Studies Conflict**:
- Check for methodological differences
- Verify phenotype definitions match
- Investigate population stratification
- Consider conditional analysis

---

## Scientific References

### Key Publications

1. **GWAS Best Practices**:
- Visscher et al. (2017). "10 Years of GWAS Discovery" *American Journal of Human Genetics* 101(1): 5-22
- PMID: 28686856
- DOI: 10.1016/j.ajhg.2017.06.005

2. **Meta-Analysis Methods**:
- Evangelou & Ioannidis (2013). "Meta-analysis methods for genome-wide association studies and beyond" *Nature Reviews Genetics* 14: 379-389
- PMID: 23657481

3. **Heterogeneity Interpretation**:
- Higgins et al. (2003). "Measuring inconsistency in meta-analyses" *BMJ* 327: 557-560
- PMID: 12958120

4. **Multi-Ancestry GWAS**:
- Peterson et al. (2019). "Genome-wide Association Studies in Ancestrally Diverse Populations" *Nature Reviews Genetics* 20: 409-422
- PMID: 30926972

5. **Replication Standards**:
- Chanock et al. (2007). "Replicating genotype-phenotype associations" *Nature* 447: 655-660
- PMID: 17554299

---

## Tools Used

### GWAS Catalog API
- `gwas_search_studies`: Find studies by trait
- `gwas_get_study_by_id`: Get detailed study metadata
- `gwas_get_associations_for_study`: Retrieve study associations
- `gwas_get_associations_for_snp`: Get SNP associations across studies
- `gwas_search_associations`: Search associations by trait

### Open Targets Genetics GraphQL API
- `OpenTargets_search_gwas_studies_by_disease`: Disease-based study search
- `OpenTargets_get_gwas_study`: Detailed study information with LD populations
- `OpenTargets_get_variant_credible_sets`: Fine-mapped loci for variant
- `OpenTargets_get_study_credible_sets`: All credible sets for study
- `OpenTargets_get_variant_info`: Variant annotation and allele frequencies

---

## Glossary

**Association**: Statistical relationship between a genetic variant and a trait

**Credible Set**: Set of variants likely to contain the causal variant (from fine-mapping)

**Effect Size**: Magnitude of genetic association (beta coefficient or odds ratio)

**Fine-Mapping**: Statistical method to identify causal variants within a locus

**Genome-Wide Significance**: p < 5×10⁻⁸, accounting for ~1M independent tests

**Heterogeneity (I²)**: Percentage of variance due to between-study differences

**L2G (Locus-to-Gene)**: Score predicting which gene is affected by a GWAS locus

**LD (Linkage Disequilibrium)**: Non-random association of alleles at different loci

**Meta-Analysis**: Statistical combination of results from multiple studies

**Replication**: Independent confirmation of an association in a new cohort

**Summary Statistics**: Per-SNP statistics (p-value, beta, SE) from GWAS

**Winner's Curse**: Overestimation of effect size in discovery studies

---

## Next Steps

After running this skill, consider:

1. **Fine-Mapping**: Use credible sets from Open Targets to identify causal variants
2. **Functional Follow-Up**: Investigate biological mechanisms of replicated loci
3. **Genetic Risk Scores**: Calculate polygenic risk scores using validated loci
4. **Drug Target Identification**: Use L2G scores to prioritize therapeutic targets
5. **Cross-Trait Analysis**: Look for pleiotropy with related traits

---

## Version History

- **v1.0** (2026-02-13): Initial release with study comparison, meta-analysis, and replication assessment

---

**Created by**: ToolUniverse GWAS Analysis Team
**Last Updated**: 2026-02-13
**License**: Open source (MIT)

Related Skills

tooluniverse-variant-interpretation

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.

tooluniverse-variant-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.

tooluniverse-target-research

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.

tooluniverse-systems-biology

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive systems biology and pathway analysis using multiple pathway databases (Reactome, KEGG, WikiPathways, Pathway Commons, BioModels). Performs pathway enrichment, protein-pathway mapping, keyword searches, and systems-level analysis. Use when analyzing gene sets, exploring biological pathways, or investigating systems-level biology.

tooluniverse-structural-variant-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.

tooluniverse-statistical-modeling

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Perform statistical modeling and regression analysis on biomedical datasets. Supports linear regression, logistic regression (binary/ordinal/multinomial), mixed-effects models, Cox proportional hazards survival analysis, Kaplan-Meier estimation, and comprehensive model diagnostics. Extracts odds ratios, hazard ratios, confidence intervals, p-values, and effect sizes. Designed to solve BixBench statistical reasoning questions involving clinical/experimental data. Use when asked to fit regression models, compute odds ratios, perform survival analysis, run statistical tests, or interpret model coefficients from provided data.

tooluniverse-spatial-transcriptomics

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Analyze spatial transcriptomics data to map gene expression in tissue architecture. Supports 10x Visium, MERFISH, seqFISH, Slide-seq, and imaging-based platforms. Performs spatial clustering, domain identification, cell-cell proximity analysis, spatial gene expression patterns, tissue architecture mapping, and integration with single-cell data. Use when analyzing spatial transcriptomics datasets, studying tissue organization, identifying spatial expression patterns, mapping cell-cell interactions in tissue context, characterizing tumor microenvironment spatial structure, or integrating spatial and single-cell RNA-seq data for comprehensive tissue analysis.

tooluniverse-spatial-omics-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Computational analysis framework for spatial multi-omics data integration. Given spatially variable genes (SVGs), spatial domain annotations, tissue type, and disease context from spatial transcriptomics/proteomics experiments (10x Visium, MERFISH, DBiTplus, SLIDE-seq, etc.), performs comprehensive biological interpretation including pathway enrichment, cell-cell interaction inference, druggable target identification, immune microenvironment characterization, and multi-modal integration. Produces a detailed markdown report with Spatial Omics Integration Score (0-100), domain-by-domain characterization, and validation recommendations. Uses 70+ ToolUniverse tools across 9 analysis phases. Use when users ask about spatial transcriptomics analysis, spatial omics interpretation, tissue heterogeneity, spatial gene expression patterns, tumor microenvironment mapping, tissue zonation, or cell-cell communication from spatial data.

tooluniverse-single-cell

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Production-ready single-cell and expression matrix analysis using scanpy, anndata, and scipy. Performs scRNA-seq QC, normalization, PCA, UMAP, Leiden/Louvain clustering, differential expression (Wilcoxon, t-test, DESeq2), cell type annotation, per-cell-type statistical analysis, gene-expression correlation, batch correction (Harmony), trajectory inference, and cell-cell communication analysis. NEW: Analyzes ligand-receptor interactions between cell types using OmniPath (CellPhoneDB, CellChatDB), scores communication strength, identifies signaling cascades, and handles multi-subunit receptor complexes. Integrates with ToolUniverse gene annotation tools (HPA, Ensembl, MyGene, UniProt) and enrichment tools (gseapy, PANTHER, STRING). Supports h5ad, 10X, CSV/TSV count matrices, and pre-annotated datasets. Use when analyzing single-cell RNA-seq data, studying cell-cell interactions, performing cell type differential expression, computing gene-expression correlations by cell type, analyzing tumor-immune communication, or answering questions about scRNA-seq datasets.

tooluniverse-sequence-retrieval

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Retrieves biological sequences (DNA, RNA, protein) from NCBI and ENA with gene disambiguation, accession type handling, and comprehensive sequence profiles. Creates detailed reports with sequence metadata, cross-database references, and download options. Use when users need nucleotide sequences, protein sequences, genome data, or mention GenBank, RefSeq, EMBL accessions.

tooluniverse-rnaseq-deseq2

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Production-ready RNA-seq differential expression analysis using PyDESeq2. Performs DESeq2 normalization, dispersion estimation, Wald testing, LFC shrinkage, and result filtering. Handles multi-factor designs, multiple contrasts, batch effects, and integrates with gene enrichment (gseapy) and ToolUniverse annotation tools (UniProt, Ensembl, OpenTargets). Supports CSV/TSV/H5AD input formats and any organism. Use when analyzing RNA-seq count matrices, identifying DEGs, performing differential expression with statistical rigor, or answering questions about gene expression changes.

tooluniverse-rare-disease-diagnosis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Provide differential diagnosis for patients with suspected rare diseases based on phenotype and genetic data. Matches symptoms to HPO terms, identifies candidate diseases from Orphanet/OMIM, prioritizes genes for testing, interprets variants of uncertain significance. Use when clinician asks about rare disease diagnosis, unexplained phenotypes, or genetic testing interpretation.