claw-ancestry-pca
Ancestry decomposition PCA against the Simons Genome Diversity Project
Best use case
claw-ancestry-pca is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Ancestry decomposition PCA against the Simons Genome Diversity Project
Teams using claw-ancestry-pca should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/claw-ancestry-pca/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How claw-ancestry-pca Compares
| Feature / Agent | claw-ancestry-pca | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Ancestry decomposition PCA against the Simons Genome Diversity Project
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# 🦖 Ancestry Decomposition PCA
Place your study cohort in global genetic context by computing a joint PCA against the Simons Genome Diversity Project (SGDP) — 345 samples from 164 populations spanning every inhabited continent.
## What it does
1. Takes your VCF + population map as input
2. Finds common variants between your cohort and the SGDP reference panel (bundled)
3. Runs PLINK PCA on the merged dataset
4. Separates your cohort from SGDP reference samples
5. Matches SGDP samples to their population labels (164 populations)
6. Generates a publication-quality multi-panel figure:
- **Panel A**: PC1 vs PC2 — main population structure of your cohort
- **Panel B**: PC3 vs PC2 with regional groupings and confidence ellipses
- **Panel C**: PC3 vs PC1 with language/cultural groupings
- **Panel D**: Global context — your samples (circles) vs SGDP (triangles)
7. Produces a markdown report with variance explained, population assignments, and reproducibility bundle
## Why this exists
If you ask ChatGPT to "run a PCA against a global reference panel," it will:
- Not know which reference panel to use
- Hallucinate PLINK flags for merging datasets with different variant sets
- Skip IBD removal (related individuals distort PCA)
- Not normalise contig names between your VCF and the reference
- Produce a single scatter plot with no population labels
This skill encodes the correct methodological decisions:
- Uses SGDP (the gold-standard reference for global diversity)
- Handles contig normalisation (chr1 vs 1)
- Filters to common biallelic SNPs shared between datasets
- Removes related individuals via IBD checks
- Produces publication-quality multi-panel figures with confidence ellipses
- Differentiates your samples (circles) from reference (triangles)
## Reference Panel
The skill bundles the SGDP v4 dataset (Mallick et al., 2016, Nature):
- 345 samples from 164 populations
- Whole-genome sequencing at high coverage
- MAF > 0.1% filter applied
- Populations span: Africa, Americas, Central/South Asia, East Asia, Europe, Middle East, Oceania
## Usage
```bash
python ancestry_pca.py \
--vcf your_cohort.vcf.gz \
--pop-map your_populations.tsv \
--output ancestry_report
```
### Demo (works out of the box)
```bash
python ancestry_pca.py --demo --output demo_report
```
The demo uses pre-computed PCA results from the Peruvian Genome Project (736 samples, 28 populations) and generates the full 4-panel figure instantly.
## Example Output
```
Ancestry Decomposition PCA
==========================
Cohort: 736 samples, 28 populations
Reference: SGDP (345 samples, 164 populations)
Common variants: 42,831 biallelic SNPs
Variance explained:
PC1: 51.44% PC2: 21.70% PC3: 6.70%
Panel D — Global Context:
Cohort samples cluster between European and East Asian
reference populations, with Amazonian groups showing
distinct positioning from Highland and Coastal groups.
Figures saved to: ancestry_report/
Figure3_PCA_composite.png (300 dpi)
Figure3_PCA_composite.pdf (vector)
Reproducibility:
commands.sh | environment.yml | checksums.sha256
```
## Interpretation Guide
- **PC1** typically captures the largest axis of global differentiation (often Africa vs non-Africa)
- **PC2** separates major continental groups (Europe, East Asia, Americas)
- **PC3** often reveals finer substructure within continental groups
- Confidence ellipses show 2.5 standard deviations around each population cluster
- Your samples shown as **circles**, SGDP reference as **triangles**
## Citation
If you use this skill in a publication, please cite:
- Mallick, S. et al. (2016). The Simons Genome Diversity Project. Nature, 538, 201-206.
- Corpas, M. (2026). ClawBio. https://github.com/ClawBio/ClawBioRelated Skills
remote-openclaw-deploy
通用远程部署 OpenClaw Agent 项目。支持任意定制化 agent 团队、跨 macOS/Linux、多渠道(飞书/Telegram/Discord)、deploy.json 声明式配置注入。一个脚本完成从零到可用的全流程。
polyclaw
> 多策略聚合交易——Polymarket/CLOB 多策略交易执行引擎
clawbio-pharmgx-reporter
Pharmacogenomic report from DTC genetic data (23andMe/AncestryDNA)
openclaw-master-skills
> OpenClaw 主控技能集——团队管理、Agent 调度、系统配置等核心管理技能
openclaw-inter-instance
OpenClaw 实例间通信。当需要在多个 OpenClaw 实例之间传递消息、同步数据、远程执行命令时使用此技能。覆盖 agent-to-agent 消息、nodes.run 远程执行、文件级通信等多种方式。
openclaw-config-helper
OpenClaw 配置修改助手。修改任何 OpenClaw 配置前必须先查阅官方文档,确保格式正确,避免系统崩溃或功能异常。强制执行:查 schema → 查文档 → 确认 → 修改的流程。
openclaw-browser-chain-debug
Diagnose OpenClaw browser control failures including browser start timeouts, Chrome CDP startup failures, missing DISPLAY, browser profile launch issues, and gateway/browser environment mismatches. Use when browser automation, browser-based cron jobs, or profile openclaw fails to start, times out, or returns Request was aborted after browser steps. Also use when deciding whether a task should run with a profile browser versus an attach browser: prefer profile for unattended automation and recurring jobs; prefer attach when a human's already-open logged-in tab or manual cooperation is required.
moneyclaw
> 财务分析工具——个人/企业财务数据聚合与分析
clawrouter
Smart LLM router — save 67% on inference costs. Routes every request to the cheapest capable model across 41 models from OpenAI, Anthropic, Google, DeepSeek, and xAI.
claw-semantic-sim
Semantic Similarity Index for disease research literature using PubMedBERT embeddings
claw-metagenomics
Shotgun metagenomics profiling — taxonomy, resistome, and functional pathways
wemp-operator
> 微信公众号全功能运营——草稿/发布/评论/用户/素材/群发/统计/菜单/二维码 API 封装