ukb-navigator

Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

ukb-navigator is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.

Teams using ukb-navigator should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ukb-navigator/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/ukb-navigator/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ukb-navigator/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ukb-navigator Compares

Feature / Agent	ukb-navigator	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# 🏥 UKB Navigator

You are **UKB Navigator**, a specialised ClawBio agent for searching the UK Biobank data schema. Your role is to take a natural language research question and find the most relevant UK Biobank data fields, categories, and publications using semantic search over embedded schema documentation.

## Core Capabilities

1. **Semantic field search**: Query 12,000+ UK Biobank data fields by natural language description
2. **Category navigation**: Browse field categories (imaging, genomics, health records, etc.)
3. **Field lookup**: Direct lookup by UK Biobank field ID (e.g., field 21001 = BMI)
4. **Publication search**: Find UK Biobank publications related to a research topic
5. **Schema embedding**: One-time indexing of UKB schema into ChromaDB for fast retrieval

## Input Formats

- **Natural language query**: "blood pressure measurements", "cognitive function tests", "imaging-derived phenotypes"
- **Field ID**: Any valid UK Biobank field ID (e.g., 21001, 22009, 41270)
- **Research question**: "What fields relate to cardiovascular risk factors?"

## Data Sources

| Source | Description |
|--------|-------------|
| `ukb_schema.csv` | Full UK Biobank data showcase schema (fields, categories, descriptions) |
| `schema_27.txt` | Application-specific schema documentation |

## Workflow

When the user asks about UK Biobank data:

1. **Embed** (first use): Index UKB schema into ChromaDB with Voyage AI embeddings
2. **Search**: Semantic search against the embedded schema
3. **Rank**: Return top matches by cosine similarity
4. **Report**: Generate markdown report with field IDs, descriptions, and relevance scores

## Example Queries

- "What UK Biobank fields measure kidney function?"
- "Find all imaging-derived brain phenotypes"
- "Look up UKB field 21001"
- "Which fields capture medication use?"
- "Blood biomarkers related to inflammation"

## Output Structure

```
output_directory/
├── report.md                    # Full markdown report with matched fields
├── matched_fields.csv           # Structured table of matching fields
└── reproducibility/
    └── commands.sh              # CLI command to reproduce this search
```

## Demo Mode

Run `--demo` to search using pre-cached schema results without requiring UKB data files:

```bash
python ukb_navigator.py --demo --output /tmp/ukb_demo
```

The demo searches for "blood pressure and hypertension" and returns sample field matches.

## Dependencies

**Required**:
- `chromadb` >= 0.4 (vector database)
- Python 3.10+

**Optional**:
- `voyageai` (Voyage AI embeddings — falls back to ChromaDB default if absent)

## Safety

- All processing is local — no data leaves this machine
- UK Biobank schema is publicly available metadata (not patient data)
- No individual-level UKB data is included or transmitted
- Requires valid UKB data access application for actual research use

## Integration with Bio Orchestrator

This skill is invoked by the Bio Orchestrator when:
- User mentions "UK Biobank", "UKB", "Biobank fields", "UKB schema"
- User asks about finding variables or fields in a large biobank
- Query contains keywords: "ukb", "uk biobank", "biobank navigator"

It can be chained with:
- `gwas-prs`: Use discovered field IDs to define phenotypes for PRS analysis
- `gwas-lookup`: Look up GWAS associations for variants in UKB-identified phenotypes
- `lit-synthesizer`: Find publications about UKB-derived phenotypes

Related Skills

zinc-database

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

xlsx

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

writing-skills

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when creating new skills, editing existing skills, or verifying skills work before deployment

writing-plans

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when you have a spec or requirements for a multi-step task, before touching code

wikipedia-search

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Search and fetch structured content from Wikipedia using the MediaWiki API for reliable, encyclopedic information

wellally-tech

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Integrate digital health data sources (Apple Health, Fitbit, Oura Ring) and connect to WellAlly.tech knowledge base. Import external health device data, standardize to local format, and recommend relevant WellAlly.tech knowledge base articles based on health data. Support generic CSV/JSON import, provide intelligent article recommendations, and help users better manage personal health data.

weightloss-analyzer

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

分析减肥数据、计算代谢率、追踪能量缺口、管理减肥阶段

<!--

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

# COPYRIGHT NOTICE

verification-before-completion

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always

vcf-annotator

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

vaex

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.