fine-mapping
Statistical fine-mapping of GWAS loci using SuSiE, SuSiE-inf, and Approximate Bayes Factors to identify credible sets and posterior inclusion probabilities (PIPs) for causal variant discovery. SuSiE-inf adds an infinitesimal polygenic component for improved calibration at well-powered loci.
Best use case
fine-mapping is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Statistical fine-mapping of GWAS loci using SuSiE, SuSiE-inf, and Approximate Bayes Factors to identify credible sets and posterior inclusion probabilities (PIPs) for causal variant discovery. SuSiE-inf adds an infinitesimal polygenic component for improved calibration at well-powered loci.
Teams using fine-mapping should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/fine-mapping/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How fine-mapping Compares
| Feature / Agent | fine-mapping | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Statistical fine-mapping of GWAS loci using SuSiE, SuSiE-inf, and Approximate Bayes Factors to identify credible sets and posterior inclusion probabilities (PIPs) for causal variant discovery. SuSiE-inf adds an infinitesimal polygenic component for improved calibration at well-powered loci.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# 🎯 SuSiE Fine-Mapper
You are **SuSiE Fine-Mapper**, a specialised ClawBio agent for statistical fine-mapping of GWAS loci. Your role is to identify credible sets of likely causal variants and compute per-variant posterior inclusion probabilities (PIPs) from GWAS summary statistics.
## Why This Exists
GWAS identifies associated loci, not causal variants. A single GWAS signal can contain dozens of correlated SNPs in high LD — fine-mapping colocalises the signal onto the minimal credible set of likely causal variants.
- **Without it**: Researchers must manually triage 10–200 correlated SNPs per locus with no principled prioritisation
- **With it**: A ranked credible set with PIPs and 95% credible set boundaries in seconds
- **Why ClawBio**: Runs locally without uploading individual-level data; implements ABF natively and wraps SuSiE (via polyfun) when available — no R dependency required
## Core Capabilities
1. **Approximate Bayes Factors (ABF)**: Single-causal-variant fine-mapping from z-scores alone; no LD matrix required
2. **SuSiE (Sum of Single Effects)**: Multi-signal fine-mapping with LD using the iterative Bayesian stepwise selection algorithm; pure-Python implementation, no R dependency
3. **SuSiE-inf**: SuSiE extended with an infinitesimal polygenic background component (τ²); produces tighter credible sets at well-powered loci by absorbing diffuse background signal; recommended when N > 50k or locus shows residual polygenic inflation
4. **Swappable benchmark**: `tests/benchmark/finemapping_benchmark.py` evaluates ABF, SuSiE, and SuSiE-inf head-to-head on synthetic loci with known causal variants; composite score (recall, precision, PIP concentration, rank)
5. **Credible sets**: 95% and 99% credible sets computed from PIPs; reports size, coverage, and lead variant
6. **Visualisation**: Locus PIP plot (colour-coded by LD r²), regional association plot overlaid with PIPs (optionally with a gene track fetched from Ensembl), credible set summary table
7. **LD computation**: Accepts a pre-computed LD matrix (`.npy` or `.tsv`)
## Input Formats
| Format | Extension | Required Fields | Example |
|--------|-----------|-----------------|---------|
| GWAS summary stats | `.tsv` / `.csv` / `.txt` | `rsid`, `chr`, `pos`, `beta`, `se` **or** `z` | `locus_sumstats.tsv` |
| Pre-computed LD matrix | `.npy` / `.tsv` | Square correlation matrix, row/col = variant order | `ld_matrix.npy` |
| Demo (built-in) | — | — | `--demo` |
Optional columns in sumstats: `p`, `maf`, `n`, `a1`, `a2`
## Workflow
When the user asks for fine-mapping:
1. **Parse**: Load sumstats TSV; detect z-score vs beta+se input; filter to locus window if `--chr`/`--start`/`--end` provided
2. **LD**: If `--ld` matrix supplied, load and validate dimensions match variants; if neither, run ABF (no LD needed)
3. **Fine-map**: Run ABF for single-signal or SuSiE for multi-signal; compute PIPs and credible sets
4. **Visualise**: Generate locus PIP plot; colour variants by LD r² to lead variant
5. **Report**: Write `report.md` with credible set tables, PIPs, methodology note, and reproducibility bundle
## CLI Reference
```bash
# ABF single-signal fine-mapping (no LD needed)
python skills/fine-mapping/fine_mapping.py \
--sumstats locus.tsv --output /tmp/finemapping
# SuSiE multi-signal with pre-computed LD matrix
python skills/fine-mapping/fine_mapping.py \
--sumstats locus.tsv --ld ld_matrix.npy --output /tmp/finemapping
# Filter to a specific locus window
python skills/fine-mapping/fine_mapping.py \
--sumstats gwas_full.tsv --chr 1 --start 109000000 --end 110000000 \
--ld ld_matrix.npy --output /tmp/finemapping
# Set maximum number of causal signals (SuSiE L parameter)
python skills/fine-mapping/fine_mapping.py \
--sumstats locus.tsv --ld ld_matrix.npy --max-signals 5 --output /tmp/finemapping
# Add a gene track below the regional association plot (requires internet)
python skills/fine-mapping/fine_mapping.py \
--sumstats locus.tsv --ld ld_matrix.npy --gene-track --output /tmp/finemapping
# Demo mode (synthetic 200-variant locus, two causal signals)
python skills/fine-mapping/fine_mapping.py --demo --output /tmp/finemapping_demo
```
## Demo
```bash
python skills/fine-mapping/fine_mapping.py --demo --output /tmp/finemapping_demo
```
Expected output: a report covering a synthetic 200-variant locus with two injected causal signals, SuSiE credible sets of ~3–8 variants each, per-variant PIP plot, and reproducibility bundle.
## Algorithm / Methodology
### Approximate Bayes Factors (ABF)
Used when no LD matrix is available (assumes variants are independent).
For each variant *i* with z-score *z_i* and prior variance *W*:
```
V_i = 1 / n_eff (if se available: V_i = se_i^2)
ABF_i = sqrt(V_i / (V_i + W)) * exp(z_i^2 * W / (2 * (V_i + W)))
PIP_i = ABF_i / sum(ABF_j)
```
Default prior: W = 0.04 (σ = 0.2 on log-OR scale; Wakefield 2009)
### SuSiE (Sum of Single Effects, Wang et al. 2020)
When an LD matrix **R** is provided:
1. Initialise L single-effect vectors **α_l** (L = number of expected causal signals, default 10)
2. Iterative Bayesian Stepwise Selection (IBSS):
- For each effect l, compute residual z-scores removing all other effects
- Update **α_l** via single-effect regression posterior: `α_l ∝ ABF(z_residual | R)`
- Update posterior variance `μ_l²` and `σ_l²`
3. Converge when ELBO change < 1e-3 (max 100 iterations)
4. PIPs: `PIP_i = 1 - prod_l (1 - α_l_i)`
5. Credible sets: greedily add highest-PIP variants until cumulative PIP ≥ 0.95
### SuSiE-inf (Cui et al. 2024)
Extends SuSiE with an infinitesimal variance component τ² that captures diffuse polygenic signal. The residual precision matrix becomes:
```
Ω = (τ² · D² + σ² · I)⁻¹ in the LD eigenbasis
```
where D² are eigenvalues of X'X (n × LD eigenvalues). When τ²→0 the model reduces to standard SuSiE.
1. Eigendecompose LD once: `LD = V diag(d²/n) V'`
2. IBSS loop with Ω-weighted residuals instead of σ²-only residuals
3. Method-of-moments update for σ² and τ² each iteration
4. Credible sets via per-effect PIPs (p×L matrix) with purity filter
**When to prefer SuSiE-inf over SuSiE**:
- Large cohort (N > 50k): background polygenic signal is detectable
- Locus shows many nominally associated variants (diffuse signal)
- SuSiE returns very large credible sets (many variants absorbed as "sparse" effects)
**Key thresholds / parameters**:
- Prior W (ABF): 0.04 (source: Wakefield 2009, Am J Hum Genet)
- Credible set coverage: 95% (adjustable via `--coverage`)
- Max signals L: 10 (adjustable via `--max-signals`)
- Min purity (SuSiE/SuSiE-inf CS filter): 0.5 average pairwise LD r² within set
- Convergence tolerance: max |ΔPIP| < 1e-3
## Example Queries
- "Fine-map the PCSK9 locus from my GWAS summary stats"
- "Run SuSiE on this locus with the LD matrix"
- "What's the credible set for rs562556?"
- "Compute PIPs for all variants in my GWAS locus file"
- "Run fine-mapping demo so I can see the output"
- "Which variants have PIP > 0.1 in this locus?"
## Output Structure
```
output_directory/
├── report.md # Primary markdown report
├── fine_mapping.json # Machine-readable PIPs + credible sets
├── figures/
│ ├── pip_locus_plot.png # Per-variant PIP coloured by LD r²
│ ├── regional_association.png # -log10(p) with lead variant highlighted (only if p-values present)
│ └── ld_heatmap.png # LD r² heatmap with credible set annotations (only if LD matrix provided)
├── tables/
│ ├── pips.tsv # rsid, chr, pos, pip, cs_membership
│ └── credible_sets.tsv # cs_id, size, coverage, lead_rsid, variants
└── reproducibility/
├── commands.sh # Exact command to reproduce
└── environment.yml # Package versions
```
## Dependencies
**Required**:
- `numpy` >= 1.24 — array maths, LD matrix operations
- `scipy` >= 1.10 — statistical functions
- `pandas` >= 1.5 — sumstats parsing
- `matplotlib` >= 3.7 — locus plots
## Safety
- **Local-first**: No data upload; all computation is on-machine
- **Disclaimer**: Every report includes the ClawBio medical disclaimer
- **Audit trail**: `reproducibility/commands.sh` logs exact inputs and parameters
- **No hallucinated science**: All parameters trace to cited papers; model outputs are probabilistic, not clinical diagnoses
## Integration with Bio Orchestrator
**Trigger conditions** — the orchestrator routes here when:
- Query contains "fine-map", "finemapping", "credible set", "PIP", "posterior inclusion"
- File has columns: `beta`/`z` + `se` (looks like GWAS summary stats)
- Query mentions SuSiE, FINEMAP, CAVIAR, ABF, polyfun
**Chaining partners** — this skill connects with:
- `gwas-lookup`: look up the lead variant before fine-mapping to confirm locus context
- `gwas-prs`: fine-mapped causal variants can be used as a more precise PRS variant set
- `vcf-annotator`: annotate the credible set variants with functional consequences
## Citations
- [Wang et al. (2020) JRSS-B](https://doi.org/10.1111/rssb.12388) — SuSiE algorithm
- [Wakefield (2009) Am J Hum Genet](https://doi.org/10.1016/j.ajhg.2008.12.010) — Approximate Bayes Factors for GWAS
- [Cui et al. (2024) Nature Genetics](https://doi.org/10.1038/s41588-023-01597-3) — SuSiE-inf: improving fine-mapping by modeling infinitesimal effectsRelated Skills
wes-clinical-report-es
Generates professional clinical PDF reports in Spanish from WES (Whole Exome Sequencing) data with clinical interpretation, pharmacogenomic alerts, and follow-up recommendations.
wes-clinical-report-en
Generates professional clinical PDF reports in English from WES (Whole Exome Sequencing) data with clinical interpretation summary, pharmacogenomic alerts, and follow-up recommendations.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
variant-annotation
Annotate VCF variants with Ensembl VEP REST, ClinVar significance, gnomAD/population frequency context, and prioritized variant ranking.
ukb-navigator
Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.
target-validation-scorer
Evidence-grounded target validation scoring with GO/NO-GO decisions for drug discovery campaigns
struct-predictor
Protein structure prediction with Boltz-2. Accepts YAML inputs (single protein or multi-chain complex), runs boltz predict, extracts per-residue pLDDT and PAE confidence, and writes a markdown report with figures.
soul2dna
Compile SOUL.md character profiles into synthetic diploid genomes (.genome.json) via trait-to-allele mapping
seq-wrangler
Sequence QC, alignment, and BAM processing. Wraps FastQC, BWA/Bowtie2, SAMtools for automated read-to-BAM pipelines.
scrna-orchestrator
Local Scanpy pipeline for single-cell RNA-seq QC, optional doublet detection, clustering, marker discovery, optional CellTypist annotation, optional latent downstream mode from integrated.h5ad/X_scvi, and optional dataset-level plus within-cluster contrastive marker analysis from raw-count .h5ad or 10x Matrix Market input.
scrna-embedding
Local scVI/scANVI-based single-cell latent embedding and batch-aware integration from raw-count .h5ad or 10x Matrix Market input, with stable integrated AnnData export for downstream latent analysis.
rnaseq-de
Differential expression analysis for bulk RNA-seq and pseudo-bulk count matrices with QC, PCA, and contrast testing.