galaxy-bridge

Galaxy tool discovery, intelligent recommendation, and execution — 8,000+ bioinformatics tools from usegalaxy.org with multi-signal scoring and workflow suggestions

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

galaxy-bridge is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Galaxy tool discovery, intelligent recommendation, and execution — 8,000+ bioinformatics tools from usegalaxy.org with multi-signal scoring and workflow suggestions

Teams using galaxy-bridge should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/galaxy-bridge/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/galaxy-bridge/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/galaxy-bridge/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How galaxy-bridge Compares

Feature / Agent	galaxy-bridge	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Galaxy tool discovery, intelligent recommendation, and execution — 8,000+ bioinformatics tools from usegalaxy.org with multi-signal scoring and workflow suggestions

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Galaxy Bridge

**ClawBio's gateway to the Galaxy ecosystem — 1,770+ production bioinformatics tools, discoverable and executable through natural language.**

## Why This Exists

Galaxy (usegalaxy.org) hosts the world's largest collection of curated bioinformatics tools — 1,770+ on the main server alone, covering everything from FASTQ QC to metagenomics to protein structure prediction. But discovering the right tool requires knowing exact tool IDs, navigating nested ToolShed categories, and understanding parameter schemas.

Galaxy Bridge makes these tools **agent-accessible**: search by natural language, execute via CLI, and chain Galaxy tools with ClawBio's local skills for cross-platform workflows that neither system can do alone.

## Core Capabilities

1. **Intelligent tool recommendation** — describe a task in plain English; multi-signal scoring across 7 dimensions returns the best Galaxy tool with explanations
2. **Workflow suggestions** — 8 pre-defined pipeline templates (RNA-seq DE, metagenomics, variant calling, WES germline, ChIP-seq, nanopore, genome assembly, variant annotation)
3. **Input format awareness** — provide your file extension (.fastq, .bam, .vcf) for format-aware recommendations
4. **Version deduplication** — 8,182 catalog entries collapse to ~2,300 unique tools; latest version preferred, version count as maturity signal
5. **EDAM ontology resolution** — 108 EDAM topic/operation IDs resolved to human-readable labels for richer matching
6. **Natural language search** — keyword-based search across 8,000+ Galaxy tools by name, description, section, and EDAM terms
7. **Remote execution** — run Galaxy tools on usegalaxy.org via BioBlend API
8. **Category browsing** — explore 86 ToolShed categories with tool counts
9. **Tool detail inspection** — view inputs, outputs, and parameter schemas
10. **Offline demo mode** — FastQC demo with pre-cached results (no API key needed)
11. **Cross-platform chaining** — Galaxy VEP → ClawBio PharmGx, Galaxy Kraken2 → ClawBio metagenomics

## Input Formats

| Format | Extension | Required Fields | Example |
|--------|-----------|----------------|---------|
| FASTQ | `.fq`, `.fastq`, `.fq.gz` | Sequence reads | Illumina paired-end reads |
| VCF | `.vcf`, `.vcf.gz` | Variant calls | Annotated VCF for VEP |
| BAM | `.bam` | Aligned reads | BWA-MEM2 output |
| FASTA | `.fa`, `.fasta` | Sequences | Reference genome |
| Tabular | `.tsv`, `.csv` | Varies by tool | Gene expression matrix |

## Workflow

1. **Search** — User describes what they need → bridge searches local catalog + Galaxy API
2. **Select** — Ranked results with descriptions, versions, and categories
3. **Configure** — Show tool inputs/outputs schema; user provides files and parameters
4. **Execute** — Upload input to Galaxy, run tool, poll for completion
5. **Retrieve** — Download outputs to local directory
6. **Bundle** — Generate reproducibility package (commands.sh, environment.yml, checksums)

## CLI Reference

```bash
# Intelligent tool recommendation (new in v0.2.0)
python galaxy_bridge.py --recommend "quality control on my sequencing reads"
python galaxy_bridge.py --recommend "classify microbial species" --format .fastq
python galaxy_bridge.py --recommend "call variants" --format .bam
python galaxy_bridge.py --recommend "annotate variants from WES" --format .vcf

# Workflow / pipeline suggestions (new in v0.2.0)
python galaxy_bridge.py --workflow "RNA-seq differential expression"
python galaxy_bridge.py --workflow "metagenomics"
python galaxy_bridge.py --workflow "whole exome sequencing"

# Search for tools by keyword
python galaxy_bridge.py --search "metagenomics profiling"
python galaxy_bridge.py --search "variant annotation"
python galaxy_bridge.py --search "RNA-seq differential expression"

# Browse Galaxy ToolShed categories
python galaxy_bridge.py --list-categories

# View tool details (inputs, outputs, parameters)
python galaxy_bridge.py --tool-details toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.74+galaxy1

# Run a tool on Galaxy (requires GALAXY_API_KEY)
python galaxy_bridge.py --run fastqc --input reads.fq.gz --output /tmp/qc_results

# Demo mode (works offline, no API key needed)
python galaxy_bridge.py --demo
```

## Recommendation Engine

The `--recommend` flag uses **multi-signal scoring** across 7 dimensions to rank tools:

| Signal | Max Points | Description |
|--------|-----------|-------------|
| Section match | 30 | Tool's Galaxy category matches the detected task |
| Preferred tool | 20 | Tool is a known best-in-class for the task |
| Exact name match | 15 | Tool name appears in the query |
| Keyword match | 15 | Query words found in tool name/description |
| EDAM ontology | 10 | EDAM topic/operation IDs match the task |
| Format compatibility | 10 | Tool accepts the specified input format |
| Version maturity | 5 | Tools with more versions score higher (log scale) |

**15 task categories** are recognised: Quality Control, Read Mapping, Variant Calling, Variant Annotation, WES/WGS, RNA-seq, Metagenomics, Genome Assembly, Genome Annotation, Phylogenetics, ChIP-seq, Single-cell, Proteomics, Nanopore, BAM Processing.

**8 workflow templates**: WES Germline, WES Annotation, RNA-seq DE, Metagenomics Profiling, Variant Calling, ChIP-seq, Nanopore Assembly, Genome Assembly.

## Demo

Running `--demo` executes a simulated FastQC analysis using pre-cached results:

```
$ python galaxy_bridge.py --demo

Galaxy Bridge — Demo Mode (offline)
====================================
Tool: FastQC v0.74+galaxy1
Input: demo/demo_reads.fq (bundled synthetic FASTQ, 1000 reads)
Output: demo/fastqc_demo_output.html

Result: PASS — Per base sequence quality ✓
        PASS — Per sequence quality scores ✓
        WARN — Per base sequence content (normal for Illumina)
        PASS — Sequence length distribution ✓

Reproducibility bundle written to demo/reproducibility/
```

## Galaxy Tool Categories

The bridge indexes tools across all 56 Galaxy ToolShed categories, including:

- **Sequence Analysis** (~30 tools): FastQC, Trimmomatic, Cutadapt, fastp
- **Metagenomics** (~25 tools): Kraken2, MetaPhlAn, HUMAnN, QIIME2
- **Variant Analysis** (~25 tools): VEP, SnpSift, BCFtools, FreeBayes
- **RNA** (~20 tools): HISAT2, StringTie, featureCounts, DESeq2
- **Proteomics** (~15 tools): MaxQuant, SearchGUI, PeptideShaker
- **Phylogenetics** (~15 tools): IQ-TREE, RAxML, MAFFT, MUSCLE
- **Genome Annotation** (~15 tools): Prokka, Augustus, MAKER
- **Assembly** (~15 tools): SPAdes, Flye, Unicycler, MEGAHIT
- **Single Cell** (~10 tools): Scanpy, CellRanger, Seurat
- **ChIP-seq/Epigenetics** (~10 tools): MACS2, deepTools, DiffBind
- **GWAS** (~10 tools): PLINK, REGENIE, BOLT-LMM
- **Nanopore** (~10 tools): NanoPlot, Medaka, minimap2

## Output Structure

```
output_dir/
├── report.md              # Analysis summary with methods and results
├── result.json            # Machine-readable: tool ID, version, parameters, output paths
├── galaxy_outputs/        # Raw outputs downloaded from Galaxy
│   ├── fastqc_report.html
│   └── ...
└── reproducibility/
    ├── commands.sh        # Galaxy API calls to reproduce
    ├── environment.yml    # Tool versions and Galaxy server info
    └── checksums.sha256   # SHA-256 of all inputs and outputs
```

## Dependencies

**Required:**
- Python 3.9+
- bioblend (Galaxy Python SDK)

**Optional (for execution):**
- `GALAXY_URL` environment variable (default: `https://usegalaxy.org`)
- `GALAXY_API_KEY` environment variable (register at usegalaxy.org)

## Safety

- **Local-first search**: Tool discovery uses the bundled `galaxy_catalog.json` — no API calls needed
- **API key optional**: Demo mode and search work without credentials
- **No data retention**: Uploaded files are deleted from Galaxy after output retrieval
- **Reproducibility**: Every execution generates a full provenance bundle
- **Disclaimer**: ClawBio is a research and educational tool. It is not a medical device and does not provide clinical diagnoses. Consult a healthcare professional before making any medical decisions.

## Integration with Bio Orchestrator

**Triggers when**: User mentions "galaxy", "usegalaxy", "tool shed", "run on galaxy", "NGS pipeline", or references a Galaxy tool ID.

**Chaining partners**:
- `pharmgx-reporter` — Galaxy VEP annotates variants → PharmGx generates dosage report
- `claw-metagenomics` — Galaxy Kraken2 → ClawBio metagenomics profiling
- `equity-scorer` — Galaxy VCF processing → HEIM equity scoring
- `vcf-annotator` — Galaxy VEP/SnpSift ↔ ClawBio annotation

## Citations

- [Galaxy Project](https://galaxyproject.org/) — Afgan et al. (2018) Nucleic Acids Research
- [BioBlend](https://bioblend.readthedocs.io/) — Sloggett et al. (2013) Bioinformatics
- [usegalaxy.org](https://usegalaxy.org/) — Main Galaxy public server
- [Galaxy ToolShed](https://toolshed.g2.bx.psu.edu/) — Community tool repository

Related Skills

zinc-database

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

xlsx

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

writing-skills

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when creating new skills, editing existing skills, or verifying skills work before deployment

writing-plans

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when you have a spec or requirements for a multi-step task, before touching code

wikipedia-search

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Search and fetch structured content from Wikipedia using the MediaWiki API for reliable, encyclopedic information

wellally-tech

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Integrate digital health data sources (Apple Health, Fitbit, Oura Ring) and connect to WellAlly.tech knowledge base. Import external health device data, standardize to local format, and recommend relevant WellAlly.tech knowledge base articles based on health data. Support generic CSV/JSON import, provide intelligent article recommendations, and help users better manage personal health data.

weightloss-analyzer

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

分析减肥数据、计算代谢率、追踪能量缺口、管理减肥阶段

<!--

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

# COPYRIGHT NOTICE

verification-before-completion

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always

vcf-annotator

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

vaex

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.