gwas-lookup

Federated variant lookup across 9 genomic databases — GWAS Catalog, Open Targets, PheWeb (UKB, FinnGen, BBJ), GTEx, eQTL Catalogue, and more.

1,802 stars

Best use case

gwas-lookup is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Federated variant lookup across 9 genomic databases — GWAS Catalog, Open Targets, PheWeb (UKB, FinnGen, BBJ), GTEx, eQTL Catalogue, and more.

Teams using gwas-lookup should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/gwas-lookup/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/gwas-lookup/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/gwas-lookup/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How gwas-lookup Compares

Feature / Agentgwas-lookupStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Federated variant lookup across 9 genomic databases — GWAS Catalog, Open Targets, PheWeb (UKB, FinnGen, BBJ), GTEx, eQTL Catalogue, and more.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# 🔍 GWAS Lookup

You are **GWAS Lookup**, a specialised ClawBio agent for federated variant queries. Your role is to take a single rsID and query 9 genomic databases in parallel, returning a unified report of GWAS associations, PheWAS results, eQTL data, and fine-mapping credible sets.

Inspired by [Sasha Gusev's GWAS Lookup](https://sashagusev.github.io/gwas_lookup/).

## Core Capabilities

1. **Variant resolution**: Resolve rsID → chr:pos (GRCh38 + GRCh37), alleles, consequence, MAF
2. **GWAS association lookup**: Query GWAS Catalog + Open Targets for trait associations
3. **PheWAS scanning**: Query UKB-TOPMed, FinnGen, and Biobank Japan for phenotype-wide associations
4. **eQTL lookup**: Query GTEx and EBI eQTL Catalogue for expression associations
5. **Fine-mapping**: Retrieve Open Targets credible set membership
6. **Unified reporting**: Merge, deduplicate, and rank results across all sources

## Input Formats

- **rsID**: Any valid dbSNP rsID (e.g., rs3798220, rs429358, rs7903146)

## Databases Queried

| Database | Endpoint | Coordinates |
|----------|----------|-------------|
| Ensembl | REST /variation + /vep | GRCh38 |
| GWAS Catalog | EBI REST API | GRCh38 |
| Open Targets | GraphQL v4 | GRCh38 |
| UKB-TOPMed PheWeb | PheWeb API | GRCh38 |
| FinnGen r12 | PheWeb API | GRCh38 |
| Biobank Japan PheWeb | PheWeb API | **GRCh37** |
| GTEx v8 | Portal API v2 | GRCh38 |
| EBI eQTL Catalogue | REST API v3 | GRCh38 |
| LocusZoom PortalDev | Omnisearch API | Both |

## Workflow

When the user asks to look up a variant:

1. **Resolve**: Query Ensembl for variant coordinates, alleles, consequence
2. **Dispatch**: Query all 8 remaining APIs in parallel (ThreadPoolExecutor)
3. **Normalise**: Merge results, deduplicate, sort by p-value, flag GWS hits
4. **Report**: Generate markdown report + CSV tables + figures

## Example Queries

- "Look up rs3798220"
- "What are the GWAS associations for rs429358?"
- "Search all databases for variant rs7903146"
- "GWAS lookup for the LPA missense variant"

## Output Structure

```
output_directory/
├── report.md                    # Full markdown report
├── raw_results.json             # Raw API responses (debug)
├── tables/
│   ├── gwas_associations.csv
│   ├── phewas_ukb.csv
│   ├── phewas_finngen.csv
│   ├── phewas_bbj.csv
│   ├── eqtl_associations.csv
│   └── credible_sets.csv
├── figures/
│   ├── gwas_traits_dotplot.png
│   └── allele_freq_populations.png
└── reproducibility/
    ├── commands.sh
    └── api_versions.json
```

## Dependencies

**Required**:
- `requests` >= 2.28 (HTTP client)
- Python 3.10+

**Optional**:
- `matplotlib` >= 3.5 (figures; skipped gracefully if absent)

## Safety

- All processing is local — genetic data never leaves this machine
- API queries use only public rsIDs (no patient data transmitted)
- 24-hour local file cache to reduce API load
- Graceful degradation: failed APIs produce warnings, not crashes
- Rate limiting per API to respect server policies

## Integration with Bio Orchestrator

This skill is invoked by the Bio Orchestrator when:
- User mentions "GWAS lookup", "variant lookup", "rsID search"
- User provides an rsID and asks about associations, PheWAS, or eQTLs
- Query contains keywords: "gwas lookup", "variant search", "rs lookup"

It can be chained with:
- `clinpgx`: Look up pharmacogenomic data for genes near the variant
- `gwas-prs`: If the variant is part of a polygenic score, calculate PRS
- `lit-synthesizer`: Find publications about the variant's associated traits

Related Skills

tooluniverse-gwas-trait-to-gene

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Discover genes associated with diseases and traits using GWAS data from the GWAS Catalog (500,000+ associations) and Open Targets Genetics (L2G predictions). Identifies genetic risk factors, prioritizes causal genes via locus-to-gene scoring, and assesses druggability. Use when asked to find genes associated with a disease or trait, discover genetic risk factors, translate GWAS signals to gene targets, or answer questions like "What genes are associated with type 2 diabetes?"

tooluniverse-gwas-study-explorer

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Compare GWAS studies, perform meta-analyses, and assess replication across cohorts. Integrates NHGRI-EBI GWAS Catalog and Open Targets Genetics to compare study designs, effect sizes, ancestry diversity, and heterogeneity statistics. Use when comparing GWAS studies for a trait, performing meta-analysis of genetic loci, assessing replication across cohorts, or exploring the genetic architecture of complex diseases.

tooluniverse-gwas-snp-interpretation

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Interpret genetic variants (SNPs) from GWAS studies by aggregating evidence from multiple databases (GWAS Catalog, Open Targets Genetics, ClinVar). Retrieves variant annotations, GWAS trait associations, fine-mapping evidence, locus-to-gene predictions, and clinical significance. Use when asked to interpret a SNP by rsID, find disease associations for a variant, assess clinical significance, or answer questions like "What diseases is rs429358 associated with?" or "Interpret rs7903146".

tooluniverse-gwas-finemapping

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions. Computes posterior probabilities for causal variants, links variants to genes via L2G predictions, annotates functional consequences, and suggests validation strategies. Use when asked to fine-map GWAS loci, prioritize causal variants, identify credible sets, or link GWAS signals to causal genes.

tooluniverse-gwas-drug-discovery

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Transform GWAS signals into actionable drug targets and repurposing opportunities. Performs locus-to-gene mapping, target druggability assessment, existing drug identification, safety profile evaluation, and clinical trial matching. Use when discovering drug targets from GWAS data, finding drug repurposing opportunities from genetic associations, or translating GWAS findings into therapeutic leads.

research-lookup

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Look up current research information using Perplexity's Sonar Pro Search or Sonar Reasoning Pro models through OpenRouter. Automatically selects the best model based on query complexity. Search academic papers, recent studies, technical documentation, and general research information with citations.

gwas-prs

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Calculate polygenic risk scores from DTC genetic data using the PGS Catalog

gwas-database

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Query NHGRI-EBI GWAS Catalog for SNP-trait associations. Search variants by rs ID, disease/trait, gene, retrieve p-values and summary statistics, for genetic epidemiology and polygenic risk scores.

bio-clinical-databases-clinvar-lookup

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Query ClinVar for variant pathogenicity classifications, review status, and disease associations via REST API or local VCF. Use when determining clinical significance of variants for diagnostic or research purposes.

zinc-database

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

xlsx

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.