tooluniverse-gwas-finemapping
Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions. Computes posterior probabilities for causal variants, links variants to genes via L2G predictions, annotates functional consequences, and suggests validation strategies. Use when asked to fine-map GWAS loci, prioritize causal variants, identify credible sets, or link GWAS signals to causal genes.
Best use case
tooluniverse-gwas-finemapping is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions. Computes posterior probabilities for causal variants, links variants to genes via L2G predictions, annotates functional consequences, and suggests validation strategies. Use when asked to fine-map GWAS loci, prioritize causal variants, identify credible sets, or link GWAS signals to causal genes.
Teams using tooluniverse-gwas-finemapping should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/tooluniverse-gwas-finemapping/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How tooluniverse-gwas-finemapping Compares
| Feature / Agent | tooluniverse-gwas-finemapping | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions. Computes posterior probabilities for causal variants, links variants to genes via L2G predictions, annotates functional consequences, and suggests validation strategies. Use when asked to fine-map GWAS loci, prioritize causal variants, identify credible sets, or link GWAS signals to causal genes.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# GWAS Fine-Mapping & Causal Variant Prioritization
Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions.
## Overview
Genome-wide association studies (GWAS) identify genomic regions associated with traits, but linkage disequilibrium (LD) makes it difficult to pinpoint the causal variant. **Fine-mapping** uses Bayesian statistical methods to compute the posterior probability that each variant is causal, given the GWAS summary statistics.
This skill provides tools to:
- **Prioritize causal variants** using fine-mapping posterior probabilities
- **Link variants to genes** using locus-to-gene (L2G) predictions
- **Annotate variants** with functional consequences
- **Suggest validation strategies** based on fine-mapping results
## Key Concepts
### Credible Sets
A **credible set** is a minimal set of variants that contains the causal variant with high confidence (typically 95% or 99%). Each variant in the set has a **posterior probability** of being causal, computed using methods like:
- **SuSiE** (Sum of Single Effects)
- **FINEMAP** (Bayesian fine-mapping)
- **PAINTOR** (Probabilistic Annotation INtegraTOR)
### Posterior Probability
The probability that a specific variant is causal, given the GWAS data and LD structure. Higher posterior probability = more likely to be causal.
### Locus-to-Gene (L2G) Predictions
L2G scores integrate multiple data types to predict which gene is affected by a variant:
- Distance to gene (closer = higher score)
- eQTL evidence (expression changes)
- Chromatin interactions (Hi-C, promoter capture)
- Functional annotations (coding variants, regulatory regions)
L2G scores range from 0 to 1, with higher scores indicating stronger gene-variant links.
## Use Cases
### 1. Prioritize Variants at a Known Locus
**Question**: "Which variant at the TCF7L2 locus is likely causal for type 2 diabetes?"
```python
from python_implementation import prioritize_causal_variants
# Prioritize variants in TCF7L2 for diabetes
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")
print(result.get_summary())
# Output shows:
# - Credible sets containing TCF7L2 variants
# - Posterior probabilities (via fine-mapping methods)
# - Top L2G genes (which genes are likely affected)
# - Associated traits
```
### 2. Fine-Map a Specific Variant
**Question**: "What do we know about rs429358 (APOE4) from fine-mapping?"
```python
# Fine-map a specific variant
result = prioritize_causal_variants("rs429358")
# Check which credible sets contain this variant
for cs in result.credible_sets:
print(f"Trait: {cs.trait}")
print(f"Fine-mapping method: {cs.finemapping_method}")
print(f"Top gene: {cs.l2g_genes[0] if cs.l2g_genes else 'N/A'}")
print(f"Confidence: {cs.confidence}")
```
### 3. Explore All Loci from a GWAS Study
**Question**: "What are all the causal loci from the recent T2D meta-analysis?"
```python
from python_implementation import get_credible_sets_for_study
# Get all fine-mapped loci from a study
credible_sets = get_credible_sets_for_study("GCST90029024") # T2D GWAS
print(f"Found {len(credible_sets)} independent loci")
# Examine each locus
for cs in credible_sets:
print(f"\nRegion: {cs.region}")
print(f"Lead variant: {cs.lead_variant.rs_ids[0] if cs.lead_variant else 'N/A'}")
if cs.l2g_genes:
top_gene = cs.l2g_genes[0]
print(f"Most likely causal gene: {top_gene.gene_symbol} (L2G: {top_gene.l2g_score:.3f})")
```
### 4. Find GWAS Studies for a Disease
**Question**: "What GWAS studies exist for Alzheimer's disease?"
```python
from python_implementation import search_gwas_studies_for_disease
# Search by disease name
studies = search_gwas_studies_for_disease("Alzheimer's disease")
for study in studies[:5]:
print(f"{study['id']}: {study.get('nSamples', 'N/A')} samples")
print(f" Author: {study.get('publicationFirstAuthor', 'N/A')}")
print(f" Has summary stats: {study.get('hasSumstats', False)}")
# Or use precise disease ontology IDs
studies = search_gwas_studies_for_disease(
"Alzheimer's disease",
disease_id="EFO_0000249" # EFO ID for Alzheimer's
)
```
### 5. Get Validation Suggestions
**Question**: "How should we validate the top causal variant?"
```python
result = prioritize_causal_variants("APOE", "alzheimer")
# Get experimental validation suggestions
suggestions = result.get_validation_suggestions()
for suggestion in suggestions:
print(suggestion)
# Output includes:
# - CRISPR knock-in experiments
# - Reporter assays
# - eQTL analysis
# - Colocalization studies
```
## Workflow Example: Complete Fine-Mapping Analysis
```python
from python_implementation import (
prioritize_causal_variants,
search_gwas_studies_for_disease,
get_credible_sets_for_study
)
# Step 1: Find relevant GWAS studies
print("Step 1: Finding T2D GWAS studies...")
studies = search_gwas_studies_for_disease("type 2 diabetes", "MONDO_0005148")
largest_study = max(studies, key=lambda s: s.get('nSamples', 0) or 0)
print(f"Largest study: {largest_study['id']} ({largest_study.get('nSamples', 'N/A')} samples)")
# Step 2: Get all fine-mapped loci from the study
print("\nStep 2: Getting fine-mapped loci...")
credible_sets = get_credible_sets_for_study(largest_study['id'], max_sets=100)
print(f"Found {len(credible_sets)} credible sets")
# Step 3: Find loci near genes of interest
print("\nStep 3: Finding TCF7L2 loci...")
tcf7l2_loci = [
cs for cs in credible_sets
if any(gene.gene_symbol == "TCF7L2" for gene in cs.l2g_genes)
]
print(f"TCF7L2 appears in {len(tcf7l2_loci)} loci")
# Step 4: Prioritize variants at TCF7L2
print("\nStep 4: Prioritizing TCF7L2 variants...")
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")
# Step 5: Print summary and validation plan
print("\n" + "="*60)
print("FINE-MAPPING SUMMARY")
print("="*60)
print(result.get_summary())
print("\n" + "="*60)
print("VALIDATION STRATEGY")
print("="*60)
suggestions = result.get_validation_suggestions()
for suggestion in suggestions:
print(suggestion)
```
## Data Classes
### `FineMappingResult`
Main result object containing:
- `query_variant`: Variant annotation
- `query_gene`: Gene symbol (if queried by gene)
- `credible_sets`: List of fine-mapped loci
- `associated_traits`: All associated traits
- `top_causal_genes`: L2G genes ranked by score
Methods:
- `get_summary()`: Human-readable summary
- `get_validation_suggestions()`: Experimental validation strategies
### `CredibleSet`
Represents a fine-mapped locus:
- `study_locus_id`: Unique identifier
- `region`: Genomic region (e.g., "10:112861809-113404438")
- `lead_variant`: Top variant by posterior probability
- `finemapping_method`: Statistical method used (SuSiE, FINEMAP, etc.)
- `l2g_genes`: Locus-to-gene predictions
- `confidence`: Credible set confidence (95%, 99%)
### `L2GGene`
Locus-to-gene prediction:
- `gene_symbol`: Gene name (e.g., "TCF7L2")
- `gene_id`: Ensembl gene ID
- `l2g_score`: Probability score (0-1)
### `VariantAnnotation`
Functional annotation for a variant:
- `variant_id`: Open Targets format (chr_pos_ref_alt)
- `rs_ids`: dbSNP identifiers
- `chromosome`, `position`: Genomic coordinates
- `most_severe_consequence`: Functional impact
- `allele_frequencies`: Population-specific MAFs
## Tools Used
### Open Targets Genetics (GraphQL)
- `OpenTargets_get_variant_info`: Variant details and allele frequencies
- `OpenTargets_get_variant_credible_sets`: Credible sets containing a variant
- `OpenTargets_get_credible_set_detail`: Detailed credible set information
- `OpenTargets_get_study_credible_sets`: All loci from a GWAS study
- `OpenTargets_search_gwas_studies_by_disease`: Find studies by disease
### GWAS Catalog (REST API)
- `gwas_search_snps`: Find SNPs by gene or rsID
- `gwas_get_snp_by_id`: Detailed SNP information
- `gwas_get_associations_for_snp`: All trait associations for a variant
- `gwas_search_studies`: Find studies by disease/trait
## Understanding Fine-Mapping Output
### Interpreting Posterior Probabilities
- **> 0.5**: Very likely causal (strong candidate)
- **0.1 - 0.5**: Plausible causal variant
- **0.01 - 0.1**: Possible but uncertain
- **< 0.01**: Unlikely to be causal
### Interpreting L2G Scores
- **> 0.7**: High confidence gene-variant link
- **0.5 - 0.7**: Moderate confidence
- **0.3 - 0.5**: Weak but possible link
- **< 0.3**: Low confidence
### Fine-Mapping Methods Compared
| Method | Approach | Strengths | Use Case |
|--------|----------|-----------|----------|
| **SuSiE** | Sum of Single Effects | Handles multiple causal variants | Multi-signal loci |
| **FINEMAP** | Bayesian shotgun stochastic search | Fast, scalable | Large studies |
| **PAINTOR** | Functional annotations | Integrates epigenomics | Regulatory variants |
| **CAVIAR** | Colocalization | Finds shared causal variants | eQTL overlap |
## Common Questions
**Q: Why don't all variants have credible sets?**
A: Fine-mapping requires:
1. GWAS summary statistics (not just top hits)
2. LD reference panel
3. Sufficient signal strength (p < 5e-8)
4. Computational resources
**Q: Can a variant be in multiple credible sets?**
A: Yes! A variant can be causal for multiple traits (pleiotropy) or appear in different studies for the same trait.
**Q: What if the top L2G gene is far from the variant?**
A: This suggests regulatory effects (enhancers, promoters). Check:
- eQTL evidence in relevant tissues
- Chromatin interaction data (Hi-C)
- Regulatory element annotations (Roadmap, ENCODE)
**Q: How do I choose between variants in a credible set?**
A: Prioritize by:
1. Posterior probability (higher = better)
2. Functional consequence (coding > regulatory > intergenic)
3. eQTL evidence
4. Evolutionary conservation
5. Experimental feasibility
## Limitations
1. **LD-dependent**: Fine-mapping accuracy depends on LD structure matching the study population
2. **Requires summary stats**: Not all studies provide full summary statistics
3. **Computational intensive**: Fine-mapping large studies takes significant resources
4. **Prior assumptions**: Bayesian methods depend on priors (number of causal variants, effect sizes)
5. **Missing data**: Not all GWAS loci have been fine-mapped in Open Targets
## Best Practices
1. **Start with study-level queries** when exploring a new disease
2. **Check multiple studies** for replication of signals
3. **Combine with functional data** (eQTLs, chromatin, CRISPR screens)
4. **Consider ancestry** - LD differs across populations
5. **Validate experimentally** - fine-mapping provides candidates, not proof
## References
1. Wang et al. (2020) "A simple new approach to variable selection in regression, with application to genetic fine mapping." *JRSS-B* (SuSiE)
2. Benner et al. (2016) "FINEMAP: efficient variable selection using summary data from genome-wide association studies." *Bioinformatics*
3. Ghoussaini et al. (2021) "Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics." *NAR*
4. Mountjoy et al. (2021) "An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci." *Nat Genet*
## Related Skills
- **tooluniverse-gwas-explorer**: Broader GWAS analysis
- **tooluniverse-eqtl-colocalization**: Link variants to gene expression
- **tooluniverse-gene-prioritization**: Systematic gene rankingRelated Skills
tooluniverse-variant-interpretation
Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.
tooluniverse-variant-analysis
Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.
tooluniverse-target-research
Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.
tooluniverse-systems-biology
Comprehensive systems biology and pathway analysis using multiple pathway databases (Reactome, KEGG, WikiPathways, Pathway Commons, BioModels). Performs pathway enrichment, protein-pathway mapping, keyword searches, and systems-level analysis. Use when analyzing gene sets, exploring biological pathways, or investigating systems-level biology.
tooluniverse-structural-variant-analysis
Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.
tooluniverse-statistical-modeling
Perform statistical modeling and regression analysis on biomedical datasets. Supports linear regression, logistic regression (binary/ordinal/multinomial), mixed-effects models, Cox proportional hazards survival analysis, Kaplan-Meier estimation, and comprehensive model diagnostics. Extracts odds ratios, hazard ratios, confidence intervals, p-values, and effect sizes. Designed to solve BixBench statistical reasoning questions involving clinical/experimental data. Use when asked to fit regression models, compute odds ratios, perform survival analysis, run statistical tests, or interpret model coefficients from provided data.
tooluniverse-spatial-transcriptomics
Analyze spatial transcriptomics data to map gene expression in tissue architecture. Supports 10x Visium, MERFISH, seqFISH, Slide-seq, and imaging-based platforms. Performs spatial clustering, domain identification, cell-cell proximity analysis, spatial gene expression patterns, tissue architecture mapping, and integration with single-cell data. Use when analyzing spatial transcriptomics datasets, studying tissue organization, identifying spatial expression patterns, mapping cell-cell interactions in tissue context, characterizing tumor microenvironment spatial structure, or integrating spatial and single-cell RNA-seq data for comprehensive tissue analysis.
tooluniverse-spatial-omics-analysis
Computational analysis framework for spatial multi-omics data integration. Given spatially variable genes (SVGs), spatial domain annotations, tissue type, and disease context from spatial transcriptomics/proteomics experiments (10x Visium, MERFISH, DBiTplus, SLIDE-seq, etc.), performs comprehensive biological interpretation including pathway enrichment, cell-cell interaction inference, druggable target identification, immune microenvironment characterization, and multi-modal integration. Produces a detailed markdown report with Spatial Omics Integration Score (0-100), domain-by-domain characterization, and validation recommendations. Uses 70+ ToolUniverse tools across 9 analysis phases. Use when users ask about spatial transcriptomics analysis, spatial omics interpretation, tissue heterogeneity, spatial gene expression patterns, tumor microenvironment mapping, tissue zonation, or cell-cell communication from spatial data.
tooluniverse-single-cell
Production-ready single-cell and expression matrix analysis using scanpy, anndata, and scipy. Performs scRNA-seq QC, normalization, PCA, UMAP, Leiden/Louvain clustering, differential expression (Wilcoxon, t-test, DESeq2), cell type annotation, per-cell-type statistical analysis, gene-expression correlation, batch correction (Harmony), trajectory inference, and cell-cell communication analysis. NEW: Analyzes ligand-receptor interactions between cell types using OmniPath (CellPhoneDB, CellChatDB), scores communication strength, identifies signaling cascades, and handles multi-subunit receptor complexes. Integrates with ToolUniverse gene annotation tools (HPA, Ensembl, MyGene, UniProt) and enrichment tools (gseapy, PANTHER, STRING). Supports h5ad, 10X, CSV/TSV count matrices, and pre-annotated datasets. Use when analyzing single-cell RNA-seq data, studying cell-cell interactions, performing cell type differential expression, computing gene-expression correlations by cell type, analyzing tumor-immune communication, or answering questions about scRNA-seq datasets.
tooluniverse-sequence-retrieval
Retrieves biological sequences (DNA, RNA, protein) from NCBI and ENA with gene disambiguation, accession type handling, and comprehensive sequence profiles. Creates detailed reports with sequence metadata, cross-database references, and download options. Use when users need nucleotide sequences, protein sequences, genome data, or mention GenBank, RefSeq, EMBL accessions.
tooluniverse-rnaseq-deseq2
Production-ready RNA-seq differential expression analysis using PyDESeq2. Performs DESeq2 normalization, dispersion estimation, Wald testing, LFC shrinkage, and result filtering. Handles multi-factor designs, multiple contrasts, batch effects, and integrates with gene enrichment (gseapy) and ToolUniverse annotation tools (UniProt, Ensembl, OpenTargets). Supports CSV/TSV/H5AD input formats and any organism. Use when analyzing RNA-seq count matrices, identifying DEGs, performing differential expression with statistical rigor, or answering questions about gene expression changes.
tooluniverse-rare-disease-diagnosis
Provide differential diagnosis for patients with suspected rare diseases based on phenotype and genetic data. Matches symptoms to HPO terms, identifies candidate diseases from Orphanet/OMIM, prioritizes genes for testing, interprets variants of uncertain significance. Use when clinician asks about rare disease diagnosis, unexplained phenotypes, or genetic testing interpretation.