ukb-navigator
Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.
Best use case
ukb-navigator is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.
Teams using ukb-navigator should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/ukb-navigator/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How ukb-navigator Compares
| Feature / Agent | ukb-navigator | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# 🏥 UKB Navigator
You are **UKB Navigator**, a specialised ClawBio agent for searching the UK Biobank data schema. Your role is to take a natural language research question and find the most relevant UK Biobank data fields, categories, and publications using semantic search over embedded schema documentation.
## Core Capabilities
1. **Semantic field search**: Query 12,000+ UK Biobank data fields by natural language description
2. **Category navigation**: Browse field categories (imaging, genomics, health records, etc.)
3. **Field lookup**: Direct lookup by UK Biobank field ID (e.g., field 21001 = BMI)
4. **Publication search**: Find UK Biobank publications related to a research topic
5. **Schema embedding**: One-time indexing of UKB schema into ChromaDB for fast retrieval
## Input Formats
- **Natural language query**: "blood pressure measurements", "cognitive function tests", "imaging-derived phenotypes"
- **Field ID**: Any valid UK Biobank field ID (e.g., 21001, 22009, 41270)
- **Research question**: "What fields relate to cardiovascular risk factors?"
## Data Sources
| Source | Description |
|--------|-------------|
| `ukb_schema.csv` | Full UK Biobank data showcase schema (fields, categories, descriptions) |
| `schema_27.txt` | Application-specific schema documentation |
## Workflow
When the user asks about UK Biobank data:
1. **Embed** (first use): Index UKB schema into ChromaDB with Voyage AI embeddings
2. **Search**: Semantic search against the embedded schema
3. **Rank**: Return top matches by cosine similarity
4. **Report**: Generate markdown report with field IDs, descriptions, and relevance scores
## Example Queries
- "What UK Biobank fields measure kidney function?"
- "Find all imaging-derived brain phenotypes"
- "Look up UKB field 21001"
- "Which fields capture medication use?"
- "Blood biomarkers related to inflammation"
## Output Structure
```
output_directory/
├── report.md # Full markdown report with matched fields
├── matched_fields.csv # Structured table of matching fields
└── reproducibility/
└── commands.sh # CLI command to reproduce this search
```
## Demo Mode
Run `--demo` to search using pre-cached schema results without requiring UKB data files:
```bash
python ukb_navigator.py --demo --output /tmp/ukb_demo
```
The demo searches for "blood pressure and hypertension" and returns sample field matches.
## Dependencies
**Required**:
- `chromadb` >= 0.4 (vector database)
- Python 3.10+
**Optional**:
- `voyageai` (Voyage AI embeddings — falls back to ChromaDB default if absent)
## Safety
- All processing is local — no data leaves this machine
- UK Biobank schema is publicly available metadata (not patient data)
- No individual-level UKB data is included or transmitted
- Requires valid UKB data access application for actual research use
## Integration with Bio Orchestrator
This skill is invoked by the Bio Orchestrator when:
- User mentions "UK Biobank", "UKB", "Biobank fields", "UKB schema"
- User asks about finding variables or fields in a large biobank
- Query contains keywords: "ukb", "uk biobank", "biobank navigator"
It can be chained with:
- `gwas-prs`: Use discovered field IDs to define phenotypes for PRS analysis
- `gwas-lookup`: Look up GWAS associations for variants in UKB-identified phenotypes
- `lit-synthesizer`: Find publications about UKB-derived phenotypesRelated Skills
wes-clinical-report-es
Generates professional clinical PDF reports in Spanish from WES (Whole Exome Sequencing) data with clinical interpretation, pharmacogenomic alerts, and follow-up recommendations.
wes-clinical-report-en
Generates professional clinical PDF reports in English from WES (Whole Exome Sequencing) data with clinical interpretation summary, pharmacogenomic alerts, and follow-up recommendations.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
variant-annotation
Annotate VCF variants with Ensembl VEP REST, ClinVar significance, gnomAD/population frequency context, and prioritized variant ranking.
target-validation-scorer
Evidence-grounded target validation scoring with GO/NO-GO decisions for drug discovery campaigns
struct-predictor
Protein structure prediction with Boltz-2. Accepts YAML inputs (single protein or multi-chain complex), runs boltz predict, extracts per-residue pLDDT and PAE confidence, and writes a markdown report with figures.
soul2dna
Compile SOUL.md character profiles into synthetic diploid genomes (.genome.json) via trait-to-allele mapping
seq-wrangler
Sequence QC, alignment, and BAM processing. Wraps FastQC, BWA/Bowtie2, SAMtools for automated read-to-BAM pipelines.
scrna-orchestrator
Local Scanpy pipeline for single-cell RNA-seq QC, optional doublet detection, clustering, marker discovery, optional CellTypist annotation, optional latent downstream mode from integrated.h5ad/X_scvi, and optional dataset-level plus within-cluster contrastive marker analysis from raw-count .h5ad or 10x Matrix Market input.
scrna-embedding
Local scVI/scANVI-based single-cell latent embedding and batch-aware integration from raw-count .h5ad or 10x Matrix Market input, with stable integrated AnnData export for downstream latent analysis.
rnaseq-de
Differential expression analysis for bulk RNA-seq and pseudo-bulk count matrices with QC, PCA, and contrast testing.
repro-enforcer
Export any bioinformatics analysis as a reproducible bundle with Conda environment, Singularity container definition, and Nextflow pipeline.