tooluniverse-gwas-trait-to-gene
Discover genes associated with diseases and traits using GWAS data from the GWAS Catalog (500,000+ associations) and Open Targets Genetics (L2G predictions). Identifies genetic risk factors, prioritizes causal genes via locus-to-gene scoring, and assesses druggability. Use when asked to find genes associated with a disease or trait, discover genetic risk factors, translate GWAS signals to gene targets, or answer questions like "What genes are associated with type 2 diabetes?"
Best use case
tooluniverse-gwas-trait-to-gene is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Discover genes associated with diseases and traits using GWAS data from the GWAS Catalog (500,000+ associations) and Open Targets Genetics (L2G predictions). Identifies genetic risk factors, prioritizes causal genes via locus-to-gene scoring, and assesses druggability. Use when asked to find genes associated with a disease or trait, discover genetic risk factors, translate GWAS signals to gene targets, or answer questions like "What genes are associated with type 2 diabetes?"
Teams using tooluniverse-gwas-trait-to-gene should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/tooluniverse-gwas-trait-to-gene/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How tooluniverse-gwas-trait-to-gene Compares
| Feature / Agent | tooluniverse-gwas-trait-to-gene | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Discover genes associated with diseases and traits using GWAS data from the GWAS Catalog (500,000+ associations) and Open Targets Genetics (L2G predictions). Identifies genetic risk factors, prioritizes causal genes via locus-to-gene scoring, and assesses druggability. Use when asked to find genes associated with a disease or trait, discover genetic risk factors, translate GWAS signals to gene targets, or answer questions like "What genes are associated with type 2 diabetes?"
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# GWAS Trait-to-Gene Discovery
**Discover genes associated with diseases and traits using genome-wide association studies (GWAS)**
## Overview
This skill enables systematic discovery of genes linked to diseases/traits by analyzing GWAS data from two major resources:
- **GWAS Catalog** (EBI/NHGRI): Curated catalog of published GWAS with >500,000 associations
- **Open Targets Genetics**: Fine-mapped GWAS signals with locus-to-gene (L2G) predictions
## Use Cases
**Clinical Research**
- "What genes are associated with type 2 diabetes?"
- "Find genetic risk factors for coronary artery disease"
- "Which genes contribute to Alzheimer's disease susceptibility?"
**Drug Target Discovery**
- Identify genes with strong genetic evidence for disease causation
- Prioritize targets based on L2G scores and replication across studies
- Find genes with genome-wide significant associations (p < 5e-8)
**Functional Genomics**
- Map disease-associated variants to candidate genes
- Analyze genetic architecture of complex traits
- Understand polygenic disease mechanisms
## Workflow
```
1. Trait Search → Search GWAS Catalog by disease/trait name
↓
2. SNP Aggregation → Collect genome-wide significant SNPs (p < 5e-8)
↓
3. Gene Mapping → Extract mapped genes from associations
↓
4. Evidence Ranking → Score by p-value, replication, fine-mapping
↓
5. Annotation (Optional) → Add L2G predictions from Open Targets
```
## Key Concepts
**Genome-wide Significance**
- Standard threshold: p < 5×10⁻⁸
- Accounts for multiple testing burden across ~1M common variants
- Higher confidence: p < 5×10⁻¹⁰ or replicated across studies
**Gene Mapping Methods**
- **Positional**: Nearest gene to lead SNP
- **Fine-mapping**: Statistical refinement to credible variants
- **Locus-to-Gene (L2G)**: Integrative score combining multiple evidence types
**Evidence Confidence Levels**
- **High**: L2G score > 0.5 OR multiple studies with p < 5e-10
- **Medium**: 2+ studies with p < 5e-8
- **Low**: Single study or marginal significance
## Required ToolUniverse Tools
### GWAS Catalog (11 tools)
- `gwas_get_associations_for_trait` - Get all associations for a trait (sorted by p-value)
- `gwas_search_snps` - Search SNPs by gene mapping
- `gwas_get_snp_by_id` - Get SNP details (MAF, consequence, location)
- `gwas_get_study_by_id` - Get study metadata
- `gwas_search_associations` - Search associations with filters
- `gwas_search_studies` - Search studies by trait/cohort
- `gwas_get_associations_for_snp` - Get all associations for a SNP
- `gwas_get_variants_for_trait` - Get variants for a trait
- `gwas_get_studies_for_trait` - Get studies for a trait
- `gwas_get_snps_for_gene` - Get SNPs mapped to a gene
- `gwas_get_associations_for_study` - Get associations from a study
### Open Targets Genetics (6 tools)
- `OpenTargets_search_gwas_studies_by_disease` - Search studies by disease ontology
- `OpenTargets_get_study_credible_sets` - Get fine-mapped loci for a study
- `OpenTargets_get_variant_credible_sets` - Get credible sets for a variant
- `OpenTargets_get_variant_info` - Get variant annotation (frequencies, consequences)
- `OpenTargets_get_gwas_study` - Get study metadata
- `OpenTargets_get_credible_set_detail` - Get detailed credible set information
## Parameters
**Required**
- `trait` - Disease/trait name (e.g., "type 2 diabetes", "coronary artery disease")
**Optional**
- `p_value_threshold` - Significance threshold (default: 5e-8)
- `min_evidence_count` - Minimum number of studies (default: 1)
- `max_results` - Maximum genes to return (default: 100)
- `use_fine_mapping` - Include L2G predictions (default: true)
- `disease_ontology_id` - Disease ontology ID for Open Targets (e.g., "MONDO_0005148")
## Output Schema
```python
{
"genes": [
{
"symbol": str, # Gene symbol (e.g., "TCF7L2")
"min_p_value": float, # Most significant p-value
"evidence_count": int, # Number of independent studies
"snps": [str], # Associated SNP rs IDs
"studies": [str], # GWAS study accessions
"l2g_score": float | null, # Locus-to-gene score (0-1)
"credible_sets": int, # Number of credible sets
"confidence_level": str # "High", "Medium", or "Low"
}
],
"summary": {
"trait": str,
"total_associations": int,
"significant_genes": int,
"data_sources": ["GWAS Catalog", "Open Targets"]
}
}
```
## Example Results
**Type 2 Diabetes**
```
TCF7L2: p=1.2e-98, 15 studies, L2G=0.82 → High confidence
KCNJ11: p=3.4e-67, 12 studies, L2G=0.76 → High confidence
PPARG: p=2.1e-45, 8 studies, L2G=0.71 → High confidence
FTO: p=5.6e-42, 10 studies, L2G=0.68 → High confidence
IRS1: p=8.9e-38, 6 studies, L2G=0.54 → High confidence
```
**Alzheimer's Disease**
```
APOE: p=1.0e-450, 25 studies, L2G=0.95 → High confidence
BIN1: p=2.3e-89, 18 studies, L2G=0.88 → High confidence
CLU: p=4.5e-67, 16 studies, L2G=0.82 → High confidence
ABCA7: p=6.7e-54, 14 studies, L2G=0.79 → High confidence
CR1: p=8.9e-52, 13 studies, L2G=0.75 → High confidence
```
## Best Practices
**1. Use Disease Ontology IDs for Precision**
```
# Instead of:
discover_gwas_genes("diabetes") # Ambiguous
# Use:
discover_gwas_genes(
"type 2 diabetes",
disease_ontology_id="MONDO_0005148" # Specific
)
```
**2. Filter by Evidence Strength**
```
# For drug targets, require strong evidence:
discover_gwas_genes(
"coronary artery disease",
p_value_threshold=5e-10, # Stricter than GWAS threshold
min_evidence_count=3, # Multiple independent studies
use_fine_mapping=True # Include L2G predictions
)
```
**3. Interpret Results Carefully**
- **Association ≠ Causation**: GWAS identifies correlated variants, not necessarily causal genes
- **Linkage Disequilibrium**: Lead SNP may tag the true causal variant in a nearby gene
- **Fine-mapping**: L2G scores provide better causal gene evidence than positional mapping
- **Functional Evidence**: Validate with orthogonal data (eQTLs, knockout models, etc.)
## Limitations
1. **Gene Mapping Uncertainty**
- Positional mapping assigns SNPs to nearest gene (may be incorrect)
- Fine-mapping available for only a subset of studies
- Intergenic variants difficult to map
2. **Population Bias**
- Most GWAS in European populations
- Effect sizes may differ across ancestries
- Rare variants often under-represented
3. **Sample Size Dependence**
- Larger studies detect more associations
- Older small studies may have false negatives
- p-values alone don't indicate effect size
4. **Validation Bug**
- Some ToolUniverse tools have oneOf validation issues
- Use `validate=False` parameter if needed
- This is automatically handled in the Python implementation
## Related Skills
- **Variant-to-Disease Association**: Look up specific SNPs (e.g., rs7903146 → T2D)
- **Gene-to-Disease Links**: Find diseases associated with known genes
- **Drug Target Prioritization**: Rank targets by genetic evidence
- **Population Genetics Analysis**: Compare allele frequencies across populations
## Data Sources
**GWAS Catalog**
- Curator: EBI and NHGRI
- URL: https://www.ebi.ac.uk/gwas/
- Coverage: 100,000+ publications, 500,000+ associations
- Update Frequency: Weekly
**Open Targets Genetics**
- Curator: Open Targets consortium
- URL: https://genetics.opentargets.org/
- Coverage: Fine-mapped GWAS, L2G predictions, QTL colocalization
- Update Frequency: Quarterly
## Citation
If you use this skill in research, please cite:
```
Buniello A, et al. (2019) The NHGRI-EBI GWAS Catalog of published genome-wide
association studies. Nucleic Acids Research, 47(D1):D1005-D1012.
Mountjoy E, et al. (2021) An open approach to systematically prioritize causal
variants and genes at all published human GWAS trait-associated loci.
Nature Genetics, 53:1527-1533.
```
## Support
For issues with:
- **Skill functionality**: Open issue at tooluniverse/skills
- **GWAS data**: Contact GWAS Catalog or Open Targets support
- **Tool errors**: Check ToolUniverse tool statusRelated Skills
tooluniverse-variant-interpretation
Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.
tooluniverse-variant-analysis
Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.
tooluniverse-target-research
Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.
tooluniverse-systems-biology
Comprehensive systems biology and pathway analysis using multiple pathway databases (Reactome, KEGG, WikiPathways, Pathway Commons, BioModels). Performs pathway enrichment, protein-pathway mapping, keyword searches, and systems-level analysis. Use when analyzing gene sets, exploring biological pathways, or investigating systems-level biology.
tooluniverse-structural-variant-analysis
Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.
tooluniverse-statistical-modeling
Perform statistical modeling and regression analysis on biomedical datasets. Supports linear regression, logistic regression (binary/ordinal/multinomial), mixed-effects models, Cox proportional hazards survival analysis, Kaplan-Meier estimation, and comprehensive model diagnostics. Extracts odds ratios, hazard ratios, confidence intervals, p-values, and effect sizes. Designed to solve BixBench statistical reasoning questions involving clinical/experimental data. Use when asked to fit regression models, compute odds ratios, perform survival analysis, run statistical tests, or interpret model coefficients from provided data.
tooluniverse-spatial-transcriptomics
Analyze spatial transcriptomics data to map gene expression in tissue architecture. Supports 10x Visium, MERFISH, seqFISH, Slide-seq, and imaging-based platforms. Performs spatial clustering, domain identification, cell-cell proximity analysis, spatial gene expression patterns, tissue architecture mapping, and integration with single-cell data. Use when analyzing spatial transcriptomics datasets, studying tissue organization, identifying spatial expression patterns, mapping cell-cell interactions in tissue context, characterizing tumor microenvironment spatial structure, or integrating spatial and single-cell RNA-seq data for comprehensive tissue analysis.
tooluniverse-spatial-omics-analysis
Computational analysis framework for spatial multi-omics data integration. Given spatially variable genes (SVGs), spatial domain annotations, tissue type, and disease context from spatial transcriptomics/proteomics experiments (10x Visium, MERFISH, DBiTplus, SLIDE-seq, etc.), performs comprehensive biological interpretation including pathway enrichment, cell-cell interaction inference, druggable target identification, immune microenvironment characterization, and multi-modal integration. Produces a detailed markdown report with Spatial Omics Integration Score (0-100), domain-by-domain characterization, and validation recommendations. Uses 70+ ToolUniverse tools across 9 analysis phases. Use when users ask about spatial transcriptomics analysis, spatial omics interpretation, tissue heterogeneity, spatial gene expression patterns, tumor microenvironment mapping, tissue zonation, or cell-cell communication from spatial data.
tooluniverse-single-cell
Production-ready single-cell and expression matrix analysis using scanpy, anndata, and scipy. Performs scRNA-seq QC, normalization, PCA, UMAP, Leiden/Louvain clustering, differential expression (Wilcoxon, t-test, DESeq2), cell type annotation, per-cell-type statistical analysis, gene-expression correlation, batch correction (Harmony), trajectory inference, and cell-cell communication analysis. NEW: Analyzes ligand-receptor interactions between cell types using OmniPath (CellPhoneDB, CellChatDB), scores communication strength, identifies signaling cascades, and handles multi-subunit receptor complexes. Integrates with ToolUniverse gene annotation tools (HPA, Ensembl, MyGene, UniProt) and enrichment tools (gseapy, PANTHER, STRING). Supports h5ad, 10X, CSV/TSV count matrices, and pre-annotated datasets. Use when analyzing single-cell RNA-seq data, studying cell-cell interactions, performing cell type differential expression, computing gene-expression correlations by cell type, analyzing tumor-immune communication, or answering questions about scRNA-seq datasets.
tooluniverse-sequence-retrieval
Retrieves biological sequences (DNA, RNA, protein) from NCBI and ENA with gene disambiguation, accession type handling, and comprehensive sequence profiles. Creates detailed reports with sequence metadata, cross-database references, and download options. Use when users need nucleotide sequences, protein sequences, genome data, or mention GenBank, RefSeq, EMBL accessions.
tooluniverse-rnaseq-deseq2
Production-ready RNA-seq differential expression analysis using PyDESeq2. Performs DESeq2 normalization, dispersion estimation, Wald testing, LFC shrinkage, and result filtering. Handles multi-factor designs, multiple contrasts, batch effects, and integrates with gene enrichment (gseapy) and ToolUniverse annotation tools (UniProt, Ensembl, OpenTargets). Supports CSV/TSV/H5AD input formats and any organism. Use when analyzing RNA-seq count matrices, identifying DEGs, performing differential expression with statistical rigor, or answering questions about gene expression changes.
tooluniverse-rare-disease-diagnosis
Provide differential diagnosis for patients with suspected rare diseases based on phenotype and genetic data. Matches symptoms to HPO terms, identifies candidate diseases from Orphanet/OMIM, prioritizes genes for testing, interprets variants of uncertain significance. Use when clinician asks about rare disease diagnosis, unexplained phenotypes, or genetic testing interpretation.