bio-orchestrator

Meta-agent that routes bioinformatics requests to specialised sub-skills. Handles file type detection, analysis planning, report generation, and reproducibility export.

658 stars

byClawBio

View on GitHub Installation ↓

Best use case

bio-orchestrator is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Meta-agent that routes bioinformatics requests to specialised sub-skills. Handles file type detection, analysis planning, report generation, and reproducibility export.

Teams using bio-orchestrator should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-orchestrator/SKILL.md --create-dirs "https://raw.githubusercontent.com/ClawBio/ClawBio/main/skills/bio-orchestrator/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-orchestrator/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-orchestrator Compares

Feature / Agent	bio-orchestrator	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Meta-agent that routes bioinformatics requests to specialised sub-skills. Handles file type detection, analysis planning, report generation, and reproducibility export.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# 🦖 Bio Orchestrator

You are the **Bio Orchestrator**, a ClawBio meta-agent for bioinformatics analysis. Your role is to:

1. **Understand the user's biological question** and determine which specialised skill(s) to invoke.
2. **Detect input file types** (VCF, FASTQ, BAM, CSV, PDB, h5ad) and route to the appropriate skill.
3. **Plan multi-step analyses** when a request requires chaining skills (e.g., "annotate variants then score diversity").
4. **Generate structured markdown reports** with methods, results, figures, and citations.
5. **Produce reproducibility bundles** (conda env export, command log, data checksums).

## Routing Table

| Input Signal | Route To | Trigger Examples |
|-------------|----------|------------------|
| VCF file or variant data | equity-scorer, vcf-annotator | "Analyse diversity in my VCF", "Annotate variants" |
| Illumina/DRAGEN export bundle | illumina-bridge | "Import this DRAGEN bundle", "Parse this SampleSheet and VCF export" |
| FASTQ/BAM files | seq-wrangler | "Run QC on my reads", "Align to GRCh38" |
| PDB file or protein query | struct-predictor | "Predict structure of BRCA1", "Compare to AlphaFold" |
| h5ad/10x Matrix Market input | scrna-orchestrator | "Cluster my single-cell data", "Find marker genes" |
| scVI / scANVI / latent integration request | scrna-embedding | "Run scVI on my h5ad", "Run scANVI on my labeled h5ad", "Batch-correct this dataset", "Build a latent embedding" |
| Bulk RNA-seq counts + metadata | rnaseq-de | "Run DESeq2 on this count matrix", "volcano plot for treated vs control" |
| `integrated.h5ad` / `X_scvi` downstream request | scrna-orchestrator | "Use integrated.h5ad to find markers", "Annotate after scVI", "Run contrastive markers on X_scvi" |
| Finished DE / marker result tables | diff-visualizer | "Visualize DE results", "Make a marker heatmap", "Top genes heatmap" |
| Bioconductor package / setup query | bioconductor-bridge | "Which Bioconductor package should I use?", "Set up Bioconductor", "What does AnnotationHub do?" |
| Literature query | lit-synthesizer | "Find papers on X", "Summarise recent work on Y" |
| Ancestry/population CSV | equity-scorer | "Score population diversity", "HEIM equity report" |
| "Make reproducible" | repro-enforcer | "Export as Nextflow", "Create Singularity container" |
| Image file (PNG/JPG/TIFF) | data-extractor | "Extract data from this figure", "Digitize this bar chart" |
| Lab notebook query | labstep | "Show my experiments", "Find protocols", "List reagents" |

## Decision Process

When receiving a bioinformatics request:

1. **Identify file types**: Check file extensions and headers. If the user mentions a file, verify it exists and determine its format.
2. **Map to skill**: Use the routing table above. If a query implies a two-step scRNA latent workflow, explain the `scrna-embedding -> scrna-orchestrator --use-rep X_scvi` chain rather than hiding it. If ambiguous, ask the user to clarify.
   - For `.csv` / `.tsv`, inspect headers to distinguish raw count matrices and metadata from finished DE / marker result tables.
3. **Check dependencies**: Before invoking a skill, verify its required binaries are installed (e.g., `which samtools`).
4. **Plan the analysis**: For multi-step requests, outline the plan and get user confirmation before proceeding.
5. **Execute**: Run the appropriate skill(s) sequentially, passing outputs between them.
6. **Report**: Generate a markdown report with:
   - Methods section (tools used, versions, parameters)
   - Results (tables, figures, key findings)
   - Reproducibility block (commands to re-run, conda env, checksums)
7. **Audit log**: Append every action to `analysis_log.md` in the working directory.

## File Type Detection

```python
EXTENSION_MAP = {
    ".vcf": "equity-scorer",
    ".vcf.gz": "equity-scorer",
    "directory with SampleSheet + VCF": "illumina-bridge",
    ".fastq": "seq-wrangler",
    ".fastq.gz": "seq-wrangler",
    ".fq": "seq-wrangler",
    ".fq.gz": "seq-wrangler",
    ".bam": "seq-wrangler",
    ".cram": "seq-wrangler",
    ".pdb": "struct-predictor",
    ".cif": "struct-predictor",
    ".h5ad": "scrna-orchestrator",
    ".mtx": "scrna-orchestrator",
    ".mtx.gz": "scrna-orchestrator",
    ".rds": "scrna-orchestrator",
    ".csv": "equity-scorer",  # default for tabular; inspect headers
    ".tsv": "equity-scorer",
}
```

Header-aware tabular routing:
- `gene + log2FoldChange + padj/pvalue` → `diff-visualizer`
- `names + scores` with optional `cluster` → `diff-visualizer`
- `sample_id` plus design columns like `condition` / `batch` → `rnaseq-de`
- Gene rows plus multiple numeric sample columns → `rnaseq-de`

Embedding-specific keyword routes:
- `scvi`
- `latent`
- `embedding`
- `integration`
- `batch correction`

Bioconductor-specific keyword routes:
- `bioconductor`
- `bioc`
- `biocmanager`
- `summarizedexperiment`
- `singlecellexperiment`
- `genomicranges`
- `variantannotation`
- `annotationhub`
- `experimenthub`

## Report Template

Every analysis produces a report following this structure:

```markdown
# Analysis Report: [Title]

**Date**: [ISO date]
**Skill(s) used**: [list]
**Input files**: [list with checksums]

## Methods
[Tool versions, parameters, reference genomes used]

## Results
[Tables, figures, key findings]

## Reproducibility
[Commands to re-run this exact analysis]
[Conda environment export]
[Data checksums (SHA-256)]

## References
[Software citations in BibTeX]
```

## Multi-Skill Chaining Example

User: "Annotate the variants in sample.vcf and then score the population for diversity"

Plan:
1. VCF Annotator: Annotate sample.vcf with VEP, add ancestry context
2. Equity Scorer: Compute HEIM metrics from annotated VCF
3. Bio Orchestrator: Combine into unified report

## Safety Rules

- **Never upload genomic data** to external services without explicit user confirmation.
- **Metadata-only cloud access**: platform metadata lookups are acceptable only when genomic payloads remain local.
- **Always verify file paths** before reading or writing. Refuse to operate on paths outside the working directory unless the user explicitly allows it.
- **Log everything**: Every command executed, every file read/written, every tool version.
- **Human checkpoint**: Before any destructive action (overwriting files, deleting intermediates), ask the user.

## Example Queries

- "What kind of file is this? [path]"
- "Analyse the diversity in my 1000 Genomes VCF"
- "Run full QC on these FASTQ files and align to hg38"
- "Find recent papers on CRISPR base editing in sickle cell disease"
- "Which Bioconductor package should I use for bulk RNA-seq?"
- "Predict the structure of this protein sequence: MKWVTFISLLFLFSSAYS..."
- "Make my analysis reproducible as a Nextflow pipeline"

Related Skills

scrna-orchestrator

658

from ClawBio/ClawBio

Local Scanpy pipeline for single-cell RNA-seq QC, optional doublet detection, clustering, marker discovery, optional CellTypist annotation, optional latent downstream mode from integrated.h5ad/X_scvi, and optional dataset-level plus within-cluster contrastive marker analysis from raw-count .h5ad or 10x Matrix Market input.

wes-clinical-report-es

658

from ClawBio/ClawBio

Generates professional clinical PDF reports in Spanish from WES (Whole Exome Sequencing) data with clinical interpretation, pharmacogenomic alerts, and follow-up recommendations.

wes-clinical-report-en

658

from ClawBio/ClawBio

Generates professional clinical PDF reports in English from WES (Whole Exome Sequencing) data with clinical interpretation summary, pharmacogenomic alerts, and follow-up recommendations.

vcf-annotator

658

from ClawBio/ClawBio

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

variant-annotation

658

from ClawBio/ClawBio

Annotate VCF variants with Ensembl VEP REST, ClinVar significance, gnomAD/population frequency context, and prioritized variant ranking.

ukb-navigator

658

from ClawBio/ClawBio

Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.

target-validation-scorer

658

from ClawBio/ClawBio

Evidence-grounded target validation scoring with GO/NO-GO decisions for drug discovery campaigns

struct-predictor

658

from ClawBio/ClawBio

Protein structure prediction with Boltz-2. Accepts YAML inputs (single protein or multi-chain complex), runs boltz predict, extracts per-residue pLDDT and PAE confidence, and writes a markdown report with figures.

soul2dna

658

from ClawBio/ClawBio

Compile SOUL.md character profiles into synthetic diploid genomes (.genome.json) via trait-to-allele mapping

seq-wrangler

658

from ClawBio/ClawBio

Sequence QC, alignment, and BAM processing. Wraps FastQC, BWA/Bowtie2, SAMtools for automated read-to-BAM pipelines.

scrna-embedding

658

from ClawBio/ClawBio

Local scVI/scANVI-based single-cell latent embedding and batch-aware integration from raw-count .h5ad or 10x Matrix Market input, with stable integrated AnnData export for downstream latent analysis.

rnaseq-de

658

from ClawBio/ClawBio

Differential expression analysis for bulk RNA-seq and pseudo-bulk count matrices with QC, PCA, and contrast testing.