ukb-navigator
Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.
Best use case
ukb-navigator is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.
Teams using ukb-navigator should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/ukb-navigator/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How ukb-navigator Compares
| Feature / Agent | ukb-navigator | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# 🏥 UKB Navigator
You are **UKB Navigator**, a specialised ClawBio agent for searching the UK Biobank data schema. Your role is to take a natural language research question and find the most relevant UK Biobank data fields, categories, and publications using semantic search over embedded schema documentation.
## Core Capabilities
1. **Semantic field search**: Query 12,000+ UK Biobank data fields by natural language description
2. **Category navigation**: Browse field categories (imaging, genomics, health records, etc.)
3. **Field lookup**: Direct lookup by UK Biobank field ID (e.g., field 21001 = BMI)
4. **Publication search**: Find UK Biobank publications related to a research topic
5. **Schema embedding**: One-time indexing of UKB schema into ChromaDB for fast retrieval
## Input Formats
- **Natural language query**: "blood pressure measurements", "cognitive function tests", "imaging-derived phenotypes"
- **Field ID**: Any valid UK Biobank field ID (e.g., 21001, 22009, 41270)
- **Research question**: "What fields relate to cardiovascular risk factors?"
## Data Sources
| Source | Description |
|--------|-------------|
| `ukb_schema.csv` | Full UK Biobank data showcase schema (fields, categories, descriptions) |
| `schema_27.txt` | Application-specific schema documentation |
## Workflow
When the user asks about UK Biobank data:
1. **Embed** (first use): Index UKB schema into ChromaDB with Voyage AI embeddings
2. **Search**: Semantic search against the embedded schema
3. **Rank**: Return top matches by cosine similarity
4. **Report**: Generate markdown report with field IDs, descriptions, and relevance scores
## Example Queries
- "What UK Biobank fields measure kidney function?"
- "Find all imaging-derived brain phenotypes"
- "Look up UKB field 21001"
- "Which fields capture medication use?"
- "Blood biomarkers related to inflammation"
## Output Structure
```
output_directory/
├── report.md # Full markdown report with matched fields
├── matched_fields.csv # Structured table of matching fields
└── reproducibility/
└── commands.sh # CLI command to reproduce this search
```
## Demo Mode
Run `--demo` to search using pre-cached schema results without requiring UKB data files:
```bash
python ukb_navigator.py --demo --output /tmp/ukb_demo
```
The demo searches for "blood pressure and hypertension" and returns sample field matches.
## Dependencies
**Required**:
- `chromadb` >= 0.4 (vector database)
- Python 3.10+
**Optional**:
- `voyageai` (Voyage AI embeddings — falls back to ChromaDB default if absent)
## Safety
- All processing is local — no data leaves this machine
- UK Biobank schema is publicly available metadata (not patient data)
- No individual-level UKB data is included or transmitted
- Requires valid UKB data access application for actual research use
## Integration with Bio Orchestrator
This skill is invoked by the Bio Orchestrator when:
- User mentions "UK Biobank", "UKB", "Biobank fields", "UKB schema"
- User asks about finding variables or fields in a large biobank
- Query contains keywords: "ukb", "uk biobank", "biobank navigator"
It can be chained with:
- `gwas-prs`: Use discovered field IDs to define phenotypes for PRS analysis
- `gwas-lookup`: Look up GWAS associations for variants in UKB-identified phenotypes
- `lit-synthesizer`: Find publications about UKB-derived phenotypesRelated Skills
zinc-database
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
zarr-python
Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.
xlsx
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
writing-skills
Use when creating new skills, editing existing skills, or verifying skills work before deployment
writing-plans
Use when you have a spec or requirements for a multi-step task, before touching code
wikipedia-search
Search and fetch structured content from Wikipedia using the MediaWiki API for reliable, encyclopedic information
wellally-tech
Integrate digital health data sources (Apple Health, Fitbit, Oura Ring) and connect to WellAlly.tech knowledge base. Import external health device data, standardize to local format, and recommend relevant WellAlly.tech knowledge base articles based on health data. Support generic CSV/JSON import, provide intelligent article recommendations, and help users better manage personal health data.
weightloss-analyzer
分析减肥数据、计算代谢率、追踪能量缺口、管理减肥阶段
<!--
# COPYRIGHT NOTICE
verification-before-completion
Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
vaex
Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.