claw-metagenomics

Shotgun metagenomics profiling — taxonomy, resistome, and functional pathways

33 stars

byaAAaqwq

View on GitHub Installation ↓

Best use case

claw-metagenomics is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Shotgun metagenomics profiling — taxonomy, resistome, and functional pathways

Teams using claw-metagenomics should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/claw-metagenomics/SKILL.md --create-dirs "https://raw.githubusercontent.com/aAAaqwq/AGI-Super-Team/main/skills/claw-metagenomics/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/claw-metagenomics/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How claw-metagenomics Compares

Feature / Agent	claw-metagenomics	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Shotgun metagenomics profiling — taxonomy, resistome, and functional pathways

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Shotgun Metagenomics Profiler

Comprehensive shotgun metagenomics analysis combining taxonomic classification, antimicrobial resistance gene detection, and functional pathway profiling from paired-end FASTQ files.

## What it does

1. Takes paired-end FASTQ files (R1, R2) or a single concatenated FASTQ as input
2. Runs **Kraken2** taxonomic classification against a standard database (e.g., Standard-8, PlusPF)
3. Refines abundances with **Bracken** at species level (read re-estimation)
4. Detects antimicrobial resistance genes with **RGI** against the **CARD** database
5. Classifies detected ARGs by **WHO critical priority pathogen** association
6. Optionally runs **HUMAnN3** for functional pathway profiling (MetaCyc + UniRef)
7. Generates three publication-quality figures:
   - **Figure 1**: Taxonomy bar chart — top 20 species by relative abundance
   - **Figure 2**: Resistome heatmap — ARG families by drug class with abundance
   - **Figure 3**: WHO-critical ARG summary — priority-tier breakdown of detected resistance genes
8. Produces a full reproducibility bundle (commands.sh, environment.yml, checksums.sha256)

## Why this exists

If you ask a general AI to "analyse a metagenome," it will:
- Not know which Kraken2 database to use or how to set confidence thresholds
- Hallucinate Bracken parameters for read-length and taxonomic level
- Miss the connection between detected ARGs and WHO priority pathogen lists
- Skip HUMAnN3 entirely (or misconfigure its database paths)
- Produce a single bar chart with no resistance context
- Not provide a reproducibility bundle

This skill encodes the correct methodological decisions:
- Kraken2 confidence threshold of 0.2 (reduces false positives in environmental samples)
- Bracken re-estimation at species level with minimum 10 reads
- RGI MAIN with "Perfect" and "Strict" hit criteria only (no "Loose" hits)
- WHO Critical Priority Pathogen list mapped to detected ARG families
- HUMAnN3 with MetaCyc stratification for pathway-level functional context
- Thread count auto-detected from available CPUs
- Full reproducibility bundle for every run

## Validated On

The skill works with any shotgun metagenome but has been validated on:
- **Peru sewage metagenomics study** (6 samples, 3 collection sites: Lima, Cusco, Iquitos)
- Environmental sewage samples with mixed microbial communities
- Read depths ranging from 2M to 15M paired-end reads per sample

## WHO-Critical ARG Detection

A key feature is the classification of detected resistance genes by WHO priority tier:

| Priority | Pathogen | Resistance |
|----------|----------|------------|
| Critical | *Acinetobacter baumannii* | Carbapenem-resistant |
| Critical | *Pseudomonas aeruginosa* | Carbapenem-resistant |
| Critical | *Enterobacteriaceae* | Carbapenem-resistant, 3rd-gen cephalosporin-resistant |
| High | *Enterococcus faecium* | Vancomycin-resistant |
| High | *Staphylococcus aureus* | Methicillin-resistant, vancomycin-resistant |
| High | *Helicobacter pylori* | Clarithromycin-resistant |
| High | *Campylobacter* | Fluoroquinolone-resistant |
| High | *Salmonella* spp. | Fluoroquinolone-resistant |
| High | *Neisseria gonorrhoeae* | 3rd-gen cephalosporin-resistant, fluoroquinolone-resistant |
| Medium | *Streptococcus pneumoniae* | Penicillin-non-susceptible |
| Medium | *Haemophilus influenzae* | Ampicillin-resistant |
| Medium | *Shigella* spp. | Fluoroquinolone-resistant |

## Usage

```bash
# Full pipeline (taxonomy + resistome + functional)
python metagenomics_profiler.py \
    --r1 sample_R1.fastq.gz \
    --r2 sample_R2.fastq.gz \
    --output metagenomics_report

# Skip HUMAnN3 (faster — taxonomy + resistome only)
python metagenomics_profiler.py \
    --r1 sample_R1.fastq.gz \
    --r2 sample_R2.fastq.gz \
    --output metagenomics_report \
    --skip-functional

# Single concatenated FASTQ
python metagenomics_profiler.py \
    --input combined.fastq.gz \
    --output metagenomics_report

# Specify Kraken2 database path
python metagenomics_profiler.py \
    --r1 sample_R1.fastq.gz \
    --r2 sample_R2.fastq.gz \
    --output metagenomics_report \
    --kraken2-db /path/to/kraken2_db \
    --read-length 150
```

### Demo (works out of the box)

```bash
python metagenomics_profiler.py --demo --output demo_report
```

The demo uses pre-computed results from the Peru sewage metagenomics study (6 samples, 3 sites) and generates all figures and reports instantly without requiring external tools.

## Example Output

```
Metagenomics Profiler — ClawBio
================================
Mode: demo (pre-computed Peru sewage data)
Samples: 6 (3 sites: Lima, Cusco, Iquitos)

Taxonomy (Kraken2 + Bracken):
  Total classified: 94.2%
  Top species: Escherichia coli (12.3%), Klebsiella pneumoniae (8.7%),
               Pseudomonas aeruginosa (5.1%), Acinetobacter baumannii (3.9%)

Resistome (RGI/CARD):
  Total ARG hits: 247 (Perfect: 89, Strict: 158)
  Drug classes: 14
  WHO-Critical ARGs detected: 23
    - Carbapenem resistance: NDM-1, OXA-48, KPC-3
    - 3rd-gen cephalosporin resistance: CTX-M-15, CTX-M-27

Functional Pathways (HUMAnN3):
  Total pathways: 312
  Top: PWY-7219 (adenosine ribonucleotides de novo biosynthesis)

Figures saved to: demo_report/figures/
  taxonomy_barplot.png (300 dpi)
  resistome_heatmap.png (300 dpi)
  who_critical_args.png (300 dpi)

Reproducibility:
  commands.sh | environment.yml | checksums.sha256
```

## Pipeline Architecture

```
FASTQ R1 + R2
     |
     v
[Kraken2] --> kraken2_report.txt
     |
     v
[Bracken] --> bracken_species.tsv   --> Figure 1: Taxonomy bar chart
     |
     v
[RGI MAIN] --> rgi_results.txt      --> Figure 2: Resistome heatmap
     |                                --> Figure 3: WHO-critical ARG summary
     v
[HUMAnN3] --> pathabundance.tsv     (optional, --skip-functional to omit)
     |
     v
[Report] --> report.md + figures/ + reproducibility/
```

## Database Requirements

| Tool | Database | Size | Notes |
|------|----------|------|-------|
| Kraken2 | Standard-8 or PlusPF | 8-70 GB | Set via `--kraken2-db` or `$KRAKEN2_DB` |
| Bracken | (built from Kraken2 DB) | included | Read-length specific (default: 150 bp) |
| RGI | CARD | ~500 MB | Auto-downloaded via `rgi auto_load` |
| HUMAnN3 | ChocoPhlAn + UniRef90 | ~15 GB | Set via `--humann-db` or `$HUMANN_DB` |

## Citations

If you use this skill in a publication, please cite:

- Wood, D.E., Lu, J. & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20, 257.
- Lu, J. et al. (2017). Bracken: estimating species abundance in metagenomics data. PeerJ Computer Science, 3, e104.
- Alcock, B.P. et al. (2023). CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic Acids Research, 51(D1), D419-D430.
- Beghini, F. et al. (2021). Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife, 10, e65088.
- Corpas, M. (2026). ClawBio. https://github.com/ClawBio/ClawBio

Related Skills

remote-openclaw-deploy

from aAAaqwq/AGI-Super-Team

通用远程部署 OpenClaw Agent 项目。支持任意定制化 agent 团队、跨 macOS/Linux、多渠道（飞书/Telegram/Discord）、deploy.json 声明式配置注入。一个脚本完成从零到可用的全流程。

polyclaw

from aAAaqwq/AGI-Super-Team

> 多策略聚合交易——Polymarket/CLOB 多策略交易执行引擎

clawbio-pharmgx-reporter

from aAAaqwq/AGI-Super-Team

Pharmacogenomic report from DTC genetic data (23andMe/AncestryDNA)

openclaw-master-skills

from aAAaqwq/AGI-Super-Team

> OpenClaw 主控技能集——团队管理、Agent 调度、系统配置等核心管理技能

openclaw-inter-instance

from aAAaqwq/AGI-Super-Team

OpenClaw 实例间通信。当需要在多个 OpenClaw 实例之间传递消息、同步数据、远程执行命令时使用此技能。覆盖 agent-to-agent 消息、nodes.run 远程执行、文件级通信等多种方式。

openclaw-config-helper

from aAAaqwq/AGI-Super-Team

OpenClaw 配置修改助手。修改任何 OpenClaw 配置前必须先查阅官方文档，确保格式正确，避免系统崩溃或功能异常。强制执行：查 schema → 查文档 → 确认 → 修改的流程。

openclaw-browser-chain-debug

from aAAaqwq/AGI-Super-Team

Diagnose OpenClaw browser control failures including browser start timeouts, Chrome CDP startup failures, missing DISPLAY, browser profile launch issues, and gateway/browser environment mismatches. Use when browser automation, browser-based cron jobs, or profile openclaw fails to start, times out, or returns Request was aborted after browser steps. Also use when deciding whether a task should run with a profile browser versus an attach browser: prefer profile for unattended automation and recurring jobs; prefer attach when a human's already-open logged-in tab or manual cooperation is required.

moneyclaw

from aAAaqwq/AGI-Super-Team

> 财务分析工具——个人/企业财务数据聚合与分析

clawrouter

from aAAaqwq/AGI-Super-Team

Smart LLM router — save 67% on inference costs. Routes every request to the cheapest capable model across 41 models from OpenAI, Anthropic, Google, DeepSeek, and xAI.

claw-semantic-sim

from aAAaqwq/AGI-Super-Team

Semantic Similarity Index for disease research literature using PubMedBERT embeddings

claw-ancestry-pca

from aAAaqwq/AGI-Super-Team

Ancestry decomposition PCA against the Simons Genome Diversity Project

wemp-operator

from aAAaqwq/AGI-Super-Team

> 微信公众号全功能运营——草稿/发布/评论/用户/素材/群发/统计/菜单/二维码 API 封装

Content & Documentation