bio-metagenomics-abundance
Species abundance estimation using Bracken with Kraken2 output. Redistributes reads from higher taxonomic levels to species for more accurate estimates. Use when accurate species-level abundances are needed from Kraken2 classification output.
Best use case
bio-metagenomics-abundance is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Species abundance estimation using Bracken with Kraken2 output. Redistributes reads from higher taxonomic levels to species for more accurate estimates. Use when accurate species-level abundances are needed from Kraken2 classification output.
Teams using bio-metagenomics-abundance should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/bio-metagenomics-abundance/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How bio-metagenomics-abundance Compares
| Feature / Agent | bio-metagenomics-abundance | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Species abundance estimation using Bracken with Kraken2 output. Redistributes reads from higher taxonomic levels to species for more accurate estimates. Use when accurate species-level abundances are needed from Kraken2 classification output.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
## Version Compatibility
Reference examples tested with: Bracken 2.9+, Kraken2 2.1+, MetaPhlAn 4.1+, pandas 2.2+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Abundance Estimation with Bracken
**"Get species-level abundances from my Kraken2 results"** → Redistribute reads assigned to higher taxonomic levels down to species using Bracken's Bayesian re-estimation for more accurate abundance profiles.
- CLI: `bracken -d db -i kraken2.report -o bracken.output -r 150 -l S`
## Basic Abundance Estimation
```bash
# Run Bracken on Kraken2 report
bracken -d /path/to/kraken2_db \
-i kraken_report.txt \
-o bracken_output.txt \
-r 150 \ # Read length (100, 150, 200, 250, 300)
-l S # Taxonomic level
```
## Full Workflow with Kraken2
```bash
# Step 1: Classify with Kraken2
kraken2 --db /path/to/kraken2_db \
--threads 8 \
--paired \
--report sample_kraken_report.txt \
reads_R1.fastq.gz reads_R2.fastq.gz
# Step 2: Estimate abundances with Bracken
bracken -d /path/to/kraken2_db \
-i sample_kraken_report.txt \
-o sample_bracken_species.txt \
-w sample_bracken_report.txt \
-r 150 \
-l S
```
## Different Taxonomic Levels
```bash
# Species level (default)
bracken -d db -i report.txt -o species.txt -r 150 -l S
# Genus level
bracken -d db -i report.txt -o genus.txt -r 150 -l G
# Family level
bracken -d db -i report.txt -o family.txt -r 150 -l F
# Phylum level
bracken -d db -i report.txt -o phylum.txt -r 150 -l P
```
## Build Bracken Database
```bash
# Build Bracken database for specific read lengths
# Run AFTER building Kraken2 database
bracken-build -d /path/to/kraken2_db -t 8 -l 150
# Build for multiple read lengths
bracken-build -d /path/to/kraken2_db -t 8 -l 100
bracken-build -d /path/to/kraken2_db -t 8 -l 250
```
## Output Format
```
name taxonomy_id taxonomy_lvl kraken_assigned_reads added_reads new_est_reads fraction_total_reads
Escherichia coli 562 S 5234 1245 6479 0.52
Staphylococcus aureus 1280 S 2156 456 2612 0.21
```
## Filter Low-Abundance Taxa
```bash
# Use threshold for minimum reads
bracken -d db \
-i report.txt \
-o bracken.txt \
-r 150 \
-l S \
-t 10 # Minimum reads threshold
```
## Combine Multiple Samples
```bash
# Run Bracken on each sample
for report in kraken_reports/*.txt; do
sample=$(basename $report _kraken_report.txt)
bracken -d db -i $report -o bracken/${sample}_species.txt -r 150 -l S
done
# Combine into abundance matrix
combine_bracken_outputs.py --files bracken/*_species.txt -o combined_abundance.txt
```
## Parse Bracken Output in Python
```python
import pandas as pd
bracken = pd.read_csv('bracken_output.txt', sep='\t')
bracken_sorted = bracken.sort_values('new_est_reads', ascending=False)
bracken_sorted[['name', 'fraction_total_reads']].head(20)
total_reads = bracken['new_est_reads'].sum()
bracken['relative_abundance'] = bracken['new_est_reads'] / total_reads * 100
```
## Convert to Relative Abundance
```python
import pandas as pd
df = pd.read_csv('bracken_output.txt', sep='\t')
total = df['new_est_reads'].sum()
df['relative_abundance'] = df['new_est_reads'] / total * 100
df.to_csv('bracken_relative_abundance.txt', sep='\t', index=False)
```
## Create Abundance Matrix
**Goal:** Merge per-sample Bracken outputs into a single species-by-sample abundance matrix for downstream statistical analysis.
**Approach:** Load each Bracken output, extract species names and read counts, iteratively outer-merge on species name, and fill missing values with zero.
```python
import pandas as pd
import os
files = [f for f in os.listdir('bracken') if f.endswith('_species.txt')]
dfs = []
for f in files:
sample = f.replace('_species.txt', '')
df = pd.read_csv(f'bracken/{f}', sep='\t')
df = df[['name', 'new_est_reads']].rename(columns={'new_est_reads': sample})
dfs.append(df)
merged = dfs[0]
for df in dfs[1:]:
merged = merged.merge(df, on='name', how='outer')
merged = merged.fillna(0)
merged.to_csv('abundance_matrix.txt', sep='\t', index=False)
```
## Key Parameters
| Parameter | Description |
|-----------|-------------|
| -d | Kraken2 database path |
| -i | Input Kraken2 report |
| -o | Output abundance file |
| -w | Output updated report (optional) |
| -r | Read length used |
| -l | Taxonomic level |
| -t | Minimum read threshold |
## Taxonomic Levels
| Level | Code | Description |
|-------|------|-------------|
| Kingdom | K | Bacteria, Archaea |
| Phylum | P | Major divisions |
| Class | C | Class level |
| Order | O | Order level |
| Family | F | Family level |
| Genus | G | Genus level |
| Species | S | Species level |
## Read Length Options
Pre-built databases typically include: 50, 75, 100, 150, 200, 250, 300 bp
Choose the length closest to your actual read length.
## Related Skills
- kraken-classification - Generate Kraken2 report
- metaphlan-profiling - Alternative profiling method
- metagenome-visualization - Visualize abundancesRelated Skills
claw-metagenomics
Shotgun metagenomics profiling — taxonomy, resistome, and functional pathways
bio-proteomics-differential-abundance
Statistical testing for differentially abundant proteins between conditions. Covers limma and MSstats workflows with multiple testing correction. Use when identifying proteins with significant abundance changes between experimental groups.
bio-microbiome-differential-abundance
Differential abundance testing for microbiome data using compositionally-aware methods like ALDEx2, ANCOM-BC2, and MaAsLin2. Use when identifying taxa that differ between experimental groups while accounting for the compositional nature of microbiome data.
bio-metagenomics-visualization
Visualize metagenomic profiles using R (phyloseq, microbiome) and Python (matplotlib, seaborn). Create stacked bar plots, heatmaps, PCA plots, and diversity analyses. Use when creating publication-quality figures from MetaPhlAn, Bracken, or other taxonomic profiling output.
bio-metagenomics-strain-tracking
Track bacterial strains using MASH, sourmash, fastANI, and inStrain. Compare genomes, detect contamination, and monitor strain-level variation. Use when needing sub-species resolution for outbreak tracking, transmission analysis, or within-host strain dynamics.
bio-metagenomics-metaphlan
Marker gene-based taxonomic profiling using MetaPhlAn 4. Provides accurate species-level relative abundances using clade-specific markers. Use when accurate taxonomic profiling is needed and computational resources are limited, or for comparison with HMP/other MetaPhlAn studies.
bio-metagenomics-kraken
Taxonomic classification of metagenomic reads using Kraken2. Fast k-mer based classification against RefSeq database. Use when performing initial taxonomic classification of shotgun metagenomic reads before abundance estimation with Bracken.
bio-metagenomics-functional-profiling
Profile functional potential of metagenomes using HUMAnN3 and similar tools. Use when obtaining pathway abundances, gene family counts, or functional annotations from metagenomic data.
bio-metagenomics-amr-detection
Detect antimicrobial resistance genes using AMRFinderPlus, ResFinder, and CARD. Screen isolates and metagenomes for resistance determinants. Use when characterizing resistance profiles in clinical isolates, surveillance samples, or metagenomic data.
zinc-database
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
zarr-python
Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.
xlsx
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.