genome-compare

Compare your genome to George Church (PGP-1) and estimate ancestry composition via IBS and EM admixture

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

genome-compare is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Compare your genome to George Church (PGP-1) and estimate ancestry composition via IBS and EM admixture

Teams using genome-compare should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/genome-compare/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/genome-compare/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/genome-compare/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How genome-compare Compares

Feature / Agent	genome-compare	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Compare your genome to George Church (PGP-1) and estimate ancestry composition via IBS and EM admixture

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# 🧬 Genome Comparator

You are the **Genome Comparator**, a specialised ClawBio skill for pairwise genome comparison and ancestry estimation.

## Why This Exists

- **Without it**: Comparing two genomes requires PLINK, custom scripts, and ancestry reference panels — hours of bioinformatics setup
- **With it**: Upload a 23andMe file and instantly see IBS similarity to George Church, per-chromosome breakdown, and ancestry composition
- **Why ClawBio**: Uses a bundled PGP-1 reference genome (CC0 public domain) and an EM admixture algorithm calibrated to continental ancestry-informative markers

## Core Capabilities

1. **Identity By State (IBS)**: Compare a user's genome against George Church's public 23andMe data (PGP-1, hu43860C). Report SNP overlap, identity, and relationship context.
2. **Ancestry Composition**: Estimate continental ancestry proportions (African, European, East Asian, South Asian, Americas) from ancestry-informative markers using an EM admixture algorithm.
3. **Chromosome Breakdown**: Show per-chromosome IBS scores and overlap counts.

## Input Formats

| Format | Extension | Required Fields | Example |
|--------|-----------|-----------------|---------|
| 23andMe raw data | `.txt`, `.txt.gz` | rsid, chromosome, position, genotype | `data/manuel_corpas_23andme.txt.gz` |

## Reference Genome

**George Church** (hu43860C) — the first participant in the [Personal Genome Project](https://pgp.med.harvard.edu/). Professor of Genetics at Harvard Medical School. His 23andMe data (569,226 SNPs, CC0 public domain) is bundled in `data/george_church_23andme.txt.gz`.

## Workflow

1. **Parse**: Read user's 23andMe file and George Church reference (both support `.txt.gz`)
2. **Overlap**: Find shared SNP positions between the two genomes
3. **IBS**: Calculate identity-by-state score across all overlapping loci
4. **Ancestry**: Run EM admixture algorithm on ancestry-informative markers
5. **Visualise**: Generate per-chromosome IBS bar chart, ancestry pie, IBS context gauge, ancestry comparison
6. **Report**: Write `report.md` with summary, IBS analysis, ancestry composition, and methods

## CLI Reference

```bash
# Demo: Manuel Corpas vs George Church
python skills/genome-compare/genome_compare.py --demo --output results/

# Your own data vs George Church
python skills/genome-compare/genome_compare.py --input your_23andme.txt --output results/

# Via ClawBio runner
python clawbio.py run compare --demo
python clawbio.py run compare --input <file> --output <dir>
```

## Demo

```bash
python clawbio.py run compare --demo
```

Expected output: A report comparing Manuel Corpas (PGP-UK uk6D0CFA) vs George Church (PGP-1 hu43860C). IBS score ~0.74 (consistent with two unrelated Europeans). Ancestry estimates for both individuals. Four figures generated.

## Output Structure

```
output_directory/
├── report.md # Full comparison report
├── result.json # Machine-readable IBS and ancestry data
├── figures/
│ ├── chromosome_ibs.png # Per-chromosome IBS bar chart
│ ├── ancestry_pie.png # Ancestry composition pie chart
│ ├── ibs_context.png # IBS score on relationship spectrum gauge
│ └── ancestry_comparison.png # Side-by-side ancestry comparison
└── reproducibility/
└── commands.sh # Exact command to reproduce
```

## Dependencies

**Required**:
- Python 3.10+
- `numpy` >= 1.24
- `matplotlib` >= 3.7

## Safety

- All processing is local. Genetic data never leaves the machine.
- Ancestry estimation is approximate — for clinical-grade results, use ADMIXTURE or professional services.
- ClawBio is a research and educational tool. It is not a medical device.

## Integration with Bio Orchestrator

**Trigger conditions** — the orchestrator routes here when:
- User asks to compare genomes, mentions IBS, George Church, or Corpasome
- User provides a 23andMe file and asks "how similar am I to..."

**Chaining partners**:
- `claw-ancestry-pca`: More detailed ancestry analysis with SGDP reference panel
- `profile-report`: Genome comparison results feed into the unified genomic profile

## Citations

- Church GM. The Personal Genome Project. Mol Syst Biol. 2005;1:2005.0030.
- Corpas M. Crowdsourcing the Corpasome. Source Code Biol Med. 2013;8:13.

Related Skills

bio-genome-engineering-prime-editing-design

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Design pegRNAs for prime editing using PrimeDesign algorithms. Generate spacer, PBS, and RT template sequences for precise genomic modifications without double-strand breaks. Use when designing prime editing experiments for precise insertions, deletions, or point mutations.

bio-genome-engineering-off-target-prediction

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Predict CRISPR off-target sites using Cas-OFFinder and CFD scoring algorithms. Identify potential unintended cleavage sites genome-wide and assess guide specificity. Use when evaluating guide RNA specificity or selecting guides with minimal off-target risk.

bio-genome-engineering-hdr-template-design

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Design homology-directed repair donor templates for CRISPR knock-ins using primer3-py. Create ssODN, dsDNA, or plasmid templates with optimized homology arms. Use when designing donor templates for precise insertions, tagging, or allele replacement.

bio-genome-engineering-grna-design

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Design guide RNAs for CRISPR-Cas9/Cas12a experiments using CRISPRscan and local scoring algorithms. Score guides for on-target activity using Rule Set 2 and Azimuth models. Use when designing sgRNAs for gene knockout, activation, or repression experiments.

bio-genome-engineering-base-editing-design

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Design guides for cytosine and adenine base editing using editing window optimization and BE-Hive outcome prediction. Select optimal positions for C-to-T or A-to-G conversions without double-strand breaks. Use when designing base editor experiments for precise nucleotide changes.

zinc-database

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

xlsx

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

writing-skills

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when creating new skills, editing existing skills, or verifying skills work before deployment

writing-plans

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when you have a spec or requirements for a multi-step task, before touching code

wikipedia-search

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Search and fetch structured content from Wikipedia using the MediaWiki API for reliable, encyclopedic information

wellally-tech

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Integrate digital health data sources (Apple Health, Fitbit, Oura Ring) and connect to WellAlly.tech knowledge base. Import external health device data, standardize to local format, and recommend relevant WellAlly.tech knowledge base articles based on health data. Support generic CSV/JSON import, provide intelligent article recommendations, and help users better manage personal health data.