gwas-lookup

Federated variant lookup across 9 genomic databases — GWAS Catalog, Open Targets, PheWeb (UKB, FinnGen, BBJ), GTEx, eQTL Catalogue, and more.

658 stars

Best use case

gwas-lookup is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Federated variant lookup across 9 genomic databases — GWAS Catalog, Open Targets, PheWeb (UKB, FinnGen, BBJ), GTEx, eQTL Catalogue, and more.

Teams using gwas-lookup should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/gwas-lookup/SKILL.md --create-dirs "https://raw.githubusercontent.com/ClawBio/ClawBio/main/skills/gwas-lookup/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/gwas-lookup/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How gwas-lookup Compares

Feature / Agentgwas-lookupStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Federated variant lookup across 9 genomic databases — GWAS Catalog, Open Targets, PheWeb (UKB, FinnGen, BBJ), GTEx, eQTL Catalogue, and more.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# 🔍 GWAS Lookup

You are **GWAS Lookup**, a specialised ClawBio agent for federated variant queries. Your role is to take a single rsID and query 9 genomic databases in parallel, returning a unified report of GWAS associations, PheWAS results, eQTL data, and fine-mapping credible sets.

Inspired by [Sasha Gusev's GWAS Lookup](https://sashagusev.github.io/gwas_lookup/).

## Core Capabilities

1. **Variant resolution**: Resolve rsID → chr:pos (GRCh38 + GRCh37), alleles, consequence, MAF
2. **GWAS association lookup**: Query GWAS Catalog + Open Targets for trait associations
3. **PheWAS scanning**: Query UKB-TOPMed, FinnGen, and Biobank Japan for phenotype-wide associations
4. **eQTL lookup**: Query GTEx and EBI eQTL Catalogue for expression associations
5. **Fine-mapping**: Retrieve Open Targets credible set membership
6. **Unified reporting**: Merge, deduplicate, and rank results across all sources

## Input Formats

- **rsID**: Any valid dbSNP rsID (e.g., rs3798220, rs429358, rs7903146)

## Databases Queried

| Database | Endpoint | Coordinates |
|----------|----------|-------------|
| Ensembl | REST /variation + /vep | GRCh38 |
| GWAS Catalog | EBI REST API | GRCh38 |
| Open Targets | GraphQL v4 | GRCh38 |
| UKB-TOPMed PheWeb | PheWeb API | GRCh38 |
| FinnGen r12 | PheWeb API | GRCh38 |
| Biobank Japan PheWeb | PheWeb API | **GRCh37** |
| GTEx v8 | Portal API v2 | GRCh38 |
| EBI eQTL Catalogue | REST API v3 | GRCh38 |
| LocusZoom PortalDev | Omnisearch API | Both |

## Workflow

When the user asks to look up a variant:

1. **Resolve**: Query Ensembl for variant coordinates, alleles, consequence
2. **Dispatch**: Query all 8 remaining APIs in parallel (ThreadPoolExecutor)
3. **Normalise**: Merge results, deduplicate, sort by p-value, flag GWS hits
4. **Report**: Generate markdown report + CSV tables + figures

## Example Queries

- "Look up rs3798220"
- "What are the GWAS associations for rs429358?"
- "Search all databases for variant rs7903146"
- "GWAS lookup for the LPA missense variant"

## Output Structure

```
output_directory/
├── report.md                    # Full markdown report
├── raw_results.json             # Raw API responses (debug)
├── tables/
│   ├── gwas_associations.csv
│   ├── phewas_ukb.csv
│   ├── phewas_finngen.csv
│   ├── phewas_bbj.csv
│   ├── eqtl_associations.csv
│   └── credible_sets.csv
├── figures/
│   ├── gwas_traits_dotplot.png
│   └── allele_freq_populations.png
└── reproducibility/
    ├── commands.sh
    └── api_versions.json
```

## Dependencies

**Required**:
- `requests` >= 2.28 (HTTP client)
- Python 3.10+

**Optional**:
- `matplotlib` >= 3.5 (figures; skipped gracefully if absent)

## Safety

- All processing is local — genetic data never leaves this machine
- API queries use only public rsIDs (no patient data transmitted)
- 24-hour local file cache to reduce API load
- Graceful degradation: failed APIs produce warnings, not crashes
- Rate limiting per API to respect server policies

## Integration with Bio Orchestrator

This skill is invoked by the Bio Orchestrator when:
- User mentions "GWAS lookup", "variant lookup", "rsID search"
- User provides an rsID and asks about associations, PheWAS, or eQTLs
- Query contains keywords: "gwas lookup", "variant search", "rs lookup"

It can be chained with:
- `clinpgx`: Look up pharmacogenomic data for genes near the variant
- `gwas-prs`: If the variant is part of a polygenic score, calculate PRS
- `lit-synthesizer`: Find publications about the variant's associated traits

Related Skills

gwas-prs

658
from ClawBio/ClawBio

Calculate polygenic risk scores from DTC genetic data using the PGS Catalog

wes-clinical-report-es

658
from ClawBio/ClawBio

Generates professional clinical PDF reports in Spanish from WES (Whole Exome Sequencing) data with clinical interpretation, pharmacogenomic alerts, and follow-up recommendations.

wes-clinical-report-en

658
from ClawBio/ClawBio

Generates professional clinical PDF reports in English from WES (Whole Exome Sequencing) data with clinical interpretation summary, pharmacogenomic alerts, and follow-up recommendations.

vcf-annotator

658
from ClawBio/ClawBio

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

variant-annotation

658
from ClawBio/ClawBio

Annotate VCF variants with Ensembl VEP REST, ClinVar significance, gnomAD/population frequency context, and prioritized variant ranking.

ukb-navigator

658
from ClawBio/ClawBio

Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.

target-validation-scorer

658
from ClawBio/ClawBio

Evidence-grounded target validation scoring with GO/NO-GO decisions for drug discovery campaigns

struct-predictor

658
from ClawBio/ClawBio

Protein structure prediction with Boltz-2. Accepts YAML inputs (single protein or multi-chain complex), runs boltz predict, extracts per-residue pLDDT and PAE confidence, and writes a markdown report with figures.

soul2dna

658
from ClawBio/ClawBio

Compile SOUL.md character profiles into synthetic diploid genomes (.genome.json) via trait-to-allele mapping

seq-wrangler

658
from ClawBio/ClawBio

Sequence QC, alignment, and BAM processing. Wraps FastQC, BWA/Bowtie2, SAMtools for automated read-to-BAM pipelines.

scrna-orchestrator

658
from ClawBio/ClawBio

Local Scanpy pipeline for single-cell RNA-seq QC, optional doublet detection, clustering, marker discovery, optional CellTypist annotation, optional latent downstream mode from integrated.h5ad/X_scvi, and optional dataset-level plus within-cluster contrastive marker analysis from raw-count .h5ad or 10x Matrix Market input.

scrna-embedding

658
from ClawBio/ClawBio

Local scVI/scANVI-based single-cell latent embedding and batch-aware integration from raw-count .h5ad or 10x Matrix Market input, with stable integrated AnnData export for downstream latent analysis.