bio-metagenomics-metaphlan

Marker gene-based taxonomic profiling using MetaPhlAn 4. Provides accurate species-level relative abundances using clade-specific markers. Use when accurate taxonomic profiling is needed and computational resources are limited, or for comparison with HMP/other MetaPhlAn studies.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

bio-metagenomics-metaphlan is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bio-metagenomics-metaphlan should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-metagenomics-metaphlan/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-metagenomics-metaphlan/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-metagenomics-metaphlan/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-metagenomics-metaphlan Compares

Feature / Agent	bio-metagenomics-metaphlan	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: Bowtie2 2.5.3+, MetaPhlAn 4.1+, minimap2 2.26+, pandas 2.2+, scanpy 1.10+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# MetaPhlAn 4 Profiling

**"Profile the species composition of my metagenome"** → Determine species-level relative abundances from shotgun metagenomic reads using clade-specific marker gene alignment.
- CLI: `metaphlan sample.fastq --input_type fastq -o profile.txt`

MetaPhlAn 4 uses ~5M clade-specific markers from 26,970 species-level genome bins. Supports both short reads (bowtie2) and long reads (minimap2).

## Basic Profiling

```bash
# Profile single sample
metaphlan sample.fastq.gz \
    --input_type fastq \
    --output_file profile.txt
```

## Paired-End Reads

```bash
# MetaPhlAn processes PE as single file or concatenated
metaphlan reads_R1.fastq.gz,reads_R2.fastq.gz \
    --input_type fastq \
    --output_file profile.txt \
    --mapout sample.map.bz2
```

## Save Mapping Output for Reuse

```bash
# First run - save intermediate mapping
metaphlan sample.fastq.gz \
    --input_type fastq \
    --mapout sample.map.bz2 \
    --output_file profile.txt

# Rerun with different settings without realigning
metaphlan sample.map.bz2 \
    --input_type mapout \
    --output_file profile_v2.txt
```

## Long-Read Support (MetaPhlAn 4+)

```bash
# Long reads automatically use minimap2 instead of bowtie2
metaphlan long_reads.fastq.gz \
    --input_type fastq \
    --output_file profile.txt
```

## Common Options

```bash
metaphlan sample.fastq.gz \
    --input_type fastq \
    --nproc 8 \                    # CPU threads
    --tax_lev s \                  # Taxonomic level (k,p,c,o,f,g,s,t)
    --min_cu_len 2000 \            # Min total nucleotide length
    --stat_q 0.2 \                 # Quantile for robust average
    --output_file profile.txt \
    --mapout sample.map.bz2
```

## Install Database

```bash
# Download database (done automatically on first run)
metaphlan --install

# Or specify database location
metaphlan --install --db_dir /path/to/db
```

## Analysis Types

```bash
# Relative abundances (default)
metaphlan sample.fastq.gz --input_type fastq -t rel_ab

# Relative abundances with read counts
metaphlan sample.fastq.gz --input_type fastq -t rel_ab_w_read_stats

# Marker presence/absence
metaphlan sample.fastq.gz --input_type fastq -t marker_pres_table

# Marker abundances
metaphlan sample.fastq.gz --input_type fastq -t marker_ab_table
```

## Multiple Samples

```bash
# Process each sample
for fq in samples/*.fastq.gz; do
    sample=$(basename $fq .fastq.gz)
    metaphlan $fq \
        --input_type fastq \
        --nproc 4 \
        --output_file profiles/${sample}_profile.txt \
        --mapout mapout/${sample}.map.bz2
done

# Merge profiles
merge_metaphlan_tables.py profiles/*_profile.txt > merged_abundance.txt
```

## Filter by Taxonomic Level

```bash
# Species only
metaphlan sample.fastq.gz --input_type fastq --tax_lev s -o species.txt

# Genus only
metaphlan sample.fastq.gz --input_type fastq --tax_lev g -o genus.txt

# All levels (default)
metaphlan sample.fastq.gz --input_type fastq --tax_lev a -o all_levels.txt
```

## Output Format

```
#SampleID	sample
#clade_name	relative_abundance
k__Bacteria	100.0
k__Bacteria|p__Proteobacteria	65.23
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria	62.15
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales	58.42
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae	55.21
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia	52.33
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia_coli	52.33
```

## Parse Output in Python

```python
import pandas as pd

profile = pd.read_csv('profile.txt', sep='\t', comment='#', header=None,
                       names=['clade', 'abundance'])

species = profile[profile['clade'].str.contains('\\|s__')]
species['species'] = species['clade'].str.split('|').str[-1].str.replace('s__', '')
species.sort_values('abundance', ascending=False).head(20)
```

## Extract SGBs (Strain-level)

```bash
# Include strain-level genomic bins
metaphlan sample.fastq.gz \
    --input_type fastq \
    --tax_lev t \                  # Include t__ level (SGBs)
    --output_file profile_with_sgb.txt
```

## Sample Metadata in Output

```bash
# Add sample ID to output
metaphlan sample.fastq.gz \
    --input_type fastq \
    --sample_id sample_name \
    --output_file profile.txt
```

## Key Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| --input_type | fastq | Input format (fastq, mapout) |
| --nproc | 4 | CPU threads |
| --tax_lev | a | Taxonomic level (a=all) |
| --stat_q | 0.2 | Quantile value |
| --min_cu_len | 2000 | Min clade length |
| -t | rel_ab | Analysis type |
| --mapout | none | Save mapping output |
| --db_dir | default | Database directory |

Note: Unknown species estimation is now enabled by default in MetaPhlAn 4.2+

## Analysis Types (-t)

| Type | Description |
|------|-------------|
| rel_ab | Relative abundances (%) |
| rel_ab_w_read_stats | With read statistics |
| marker_pres_table | Marker presence/absence |
| marker_ab_table | Marker abundances |
| clade_specific_strain_tracker | Strain tracking |

## Related Skills

- kraken-classification - Alternative k-mer based classification
- abundance-estimation - Bracken for Kraken2 abundances
- metagenome-visualization - Visualize profiles

Related Skills

claw-metagenomics

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Shotgun metagenomics profiling — taxonomy, resistome, and functional pathways

bio-metagenomics-visualization

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Visualize metagenomic profiles using R (phyloseq, microbiome) and Python (matplotlib, seaborn). Create stacked bar plots, heatmaps, PCA plots, and diversity analyses. Use when creating publication-quality figures from MetaPhlAn, Bracken, or other taxonomic profiling output.

bio-metagenomics-strain-tracking

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Track bacterial strains using MASH, sourmash, fastANI, and inStrain. Compare genomes, detect contamination, and monitor strain-level variation. Use when needing sub-species resolution for outbreak tracking, transmission analysis, or within-host strain dynamics.

bio-metagenomics-kraken

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Taxonomic classification of metagenomic reads using Kraken2. Fast k-mer based classification against RefSeq database. Use when performing initial taxonomic classification of shotgun metagenomic reads before abundance estimation with Bracken.

bio-metagenomics-functional-profiling

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Profile functional potential of metagenomes using HUMAnN3 and similar tools. Use when obtaining pathway abundances, gene family counts, or functional annotations from metagenomic data.

bio-metagenomics-amr-detection

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Detect antimicrobial resistance genes using AMRFinderPlus, ResFinder, and CARD. Screen isolates and metagenomes for resistance determinants. Use when characterizing resistance profiles in clinical isolates, surveillance samples, or metagenomic data.

bio-metagenomics-abundance

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Species abundance estimation using Bracken with Kraken2 output. Redistributes reads from higher taxonomic levels to species for more accurate estimates. Use when accurate species-level abundances are needed from Kraken2 classification output.

zinc-database

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

xlsx

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

writing-skills

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when creating new skills, editing existing skills, or verifying skills work before deployment

writing-plans

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when you have a spec or requirements for a multi-step task, before touching code