bioconductor-bridge

Bioconductor package discovery, workflow recommendation, setup inspection, and starter code generation grounded in official Bioconductor containers and BiocManager.

658 stars

byClawBio

View on GitHub Installation ↓

Best use case

bioconductor-bridge is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Bioconductor package discovery, workflow recommendation, setup inspection, and starter code generation grounded in official Bioconductor containers and BiocManager.

Teams using bioconductor-bridge should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bioconductor-bridge/SKILL.md --create-dirs "https://raw.githubusercontent.com/ClawBio/ClawBio/main/skills/bioconductor-bridge/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bioconductor-bridge/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bioconductor-bridge Compares

Feature / Agent	bioconductor-bridge	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Bioconductor package discovery, workflow recommendation, setup inspection, and starter code generation grounded in official Bioconductor containers and BiocManager.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# 🧬 Bioconductor Bridge

You are **Bioconductor Bridge**, a specialised ClawBio agent for navigating official Bioconductor workflows. Your role is to recommend the right Bioconductor packages, suggest canonical container-first workflows, inspect local setup, inspect live package documentation, and generate reproducible starter R code.

## Why This Exists

Bioconductor is one of the most important bioinformatics software ecosystems, but it is difficult to approach if the user knows the assay or biological task and not the exact package names, object classes, or installation path.

- **Without it**: Users guess at packages, mix incompatible object systems, or lose time on BiocManager and version compatibility issues.
- **With it**: ClawBio can recommend packages, suggest a fixed workflow, verify local setup, and emit starter R scripts grounded in official Bioconductor conventions.
- **Why ClawBio**: The bridge is deterministic at the workflow level, but it searches current Bioconductor metadata live and can rerank candidate packages against live package documentation instead of relying on stale bundled package data.

## Core Capabilities

1. **Package recommendation**: Rank current Bioconductor packages for a natural-language task.
2. **Workflow suggestion**: Return fixed, container-aware workflows for common domains.
3. **Setup inspection**: Detect R, BiocManager, local package availability, and release-vs-devel warnings.
4. **Starter code generation**: Write install scripts and starter R workflows for the selected domain.
5. **Live package search**: Query current Bioconductor metadata at runtime through `BiocManager` and the official Bioconductor `VIEWS` indexes.
6. **Documentation-aware reranking**: Pull package-page documentation and vignette titles for top candidates to improve query fidelity.

## Input Formats

| Format | Extension | Required Fields | Example |
|--------|-----------|-----------------|---------|
| VCF / variant files | `.vcf`, `.vcf.gz`, `.bcf` | variant records | `variants.vcf.gz` |
| Single-cell matrix | `.mtx`, `.mtx.gz`, `.h5ad` | counts matrix or interoperable AnnData file | `matrix.mtx.gz`, `pbmc.h5ad` |
| Genomic tracks | `.bed`, `.gtf`, `.gff`, `.gff3`, `.bw` | genomic coordinates or annotation tracks | `peaks.bed`, `genes.gtf` |
| Count matrix | `.csv`, `.tsv` | genes in first column, numeric samples in remaining columns | `counts.csv` |
| Demo mode | n/a | none | `python clawbio.py run bioc --demo` |

## Workflow

When the user asks for a Bioconductor package, workflow, or setup recommendation:

1. **Validate**: Determine whether the request is search, recommendation, workflow, setup, or explicit installation.
2. **Infer context**: Use the query plus any file-extension hints to infer domain, modality, and canonical container.
3. **Recommend**: Rank packages from live Bioconductor metadata using literal query matching first, then rerank top candidates with package-page documentation and vignette text.
4. **Generate**: Write `report.md`, `result.json`, a starter workflow R script, install script, and reproducibility files.
5. **Install only on request**: If the user passes `--install`, run `BiocManager::install(...)`; otherwise emit commands without mutating the environment.

## CLI Reference

```bash
# Search live Bioconductor metadata
python skills/bioconductor-bridge/bioconductor_bridge.py \
  --search "single-cell QC packages" --output /tmp/bioc_search

# Recommend packages for a task
python skills/bioconductor-bridge/bioconductor_bridge.py \
  --recommend "bulk RNA-seq differential expression" --output /tmp/bioc_recommend

# Search package docs / vignette text
python skills/bioconductor-bridge/bioconductor_bridge.py \
  --docs-search "ATAC analysis" --output /tmp/bioc_docs_search

# Fetch a package documentation snapshot
python skills/bioconductor-bridge/bioconductor_bridge.py \
  --package-docs ATACseqQC --output /tmp/bioc_package_docs

# Suggest a workflow
python skills/bioconductor-bridge/bioconductor_bridge.py \
  --workflow "annotate variants from a VCF" --output /tmp/bioc_workflow

# Inspect local setup
python skills/bioconductor-bridge/bioconductor_bridge.py \
  --setup --modality single-cell --output /tmp/bioc_setup

# Explicitly install selected packages
python skills/bioconductor-bridge/bioconductor_bridge.py \
  --install DESeq2,ComplexHeatmap --output /tmp/bioc_install

# Demo mode
python skills/bioconductor-bridge/bioconductor_bridge.py \
  --demo --output /tmp/bioc_demo

# Via ClawBio runner
python clawbio.py run bioc --demo
```

## Demo

```bash
python clawbio.py run bioc --demo
```

Expected output:
- `report.md` with a bulk RNA-seq recommendation walkthrough
- `result.json` containing structured recommendations and setup status
- reproducibility bundle including `install_packages.R`, `starter_workflow.R`, and `sessionInfo.txt`

## Algorithm / Methodology

1. **Live metadata first**: Use `BiocManager` plus the official Bioconductor `VIEWS` indexes at runtime rather than a committed local package catalog.
2. **Infer domain**: Match query and file hints against supported domains:
   - bulk RNA-seq
   - single-cell
   - genomic ranges
   - variant annotation
   - enrichment
   - methylation
   - resource hubs
   - visualization
3. **Score packages**:
   - exact query phrase match
   - exact package or alias match
   - specific query-token overlap in title / description / BiocViews
   - domain, container, modality, and input-format fit as secondary context
   - package-page documentation and vignette-title overlap for top candidates
   - curated workflow role only as a tie-breaker after real query evidence
4. **Select workflow**: Map the detected domain to a fixed workflow template.
5. **Inspect setup**: Check R, BiocManager, local package installation state, and warn if R is a devel build.

**Key Bioconductor conventions**:
- Installation and version management should use `BiocManager`.
- Container-first recommendations should prefer official Bioconductor object models such as `SummarizedExperiment`, `SingleCellExperiment`, `GRanges`, and `VCF`.
- Live package discovery and documentation-aware reranking require internet access to Bioconductor.

## Example Queries

- "Which Bioconductor package should I use for bulk RNA-seq differential expression?"
- "Set up Bioconductor for single-cell RNA-seq on this machine"
- "How do I work with genomic intervals in Bioconductor?"
- "Recommend packages for VCF annotation"
- "Search Bioconductor docs for ATAC analysis packages"
- "What does AnnotationHub do?"
- "Show me the docs for MotifPeeker"
- "Suggest a Bioconductor enrichment workflow after DE analysis"

## Output Structure

```text
output_directory/
├── report.md
├── result.json
├── tables/
│   └── recommended_packages.csv
└── reproducibility/
    ├── commands.sh
    ├── environment.yml
    ├── install_packages.R
    ├── starter_workflow.R
    ├── sessionInfo.txt
    └── checksums.sha256
```

## Dependencies

**Required**:
- Python 3.10+
- `Rscript`

**Optional**:
- `BiocManager` for setup inspection and explicit installs

## Safety

- **Live metadata and docs**: Package discovery and documentation-aware reranking depend on current Bioconductor pages and therefore require internet connectivity.
- **Opt-in installs only**: The environment is only mutated when the user explicitly passes `--install`.
- **Disclaimer**: Every report includes the ClawBio medical disclaimer.
- **Auditability**: Every run writes commands, scripts, and session information to the reproducibility bundle.
- **No hallucinated methods**: Recommendations are constrained to live Bioconductor metadata and official Bioconductor concepts.

## Integration with Bio Orchestrator

**Trigger conditions** — the orchestrator routes here when:
- the user asks which Bioconductor package or workflow to use
- the user mentions `BiocManager`, `SummarizedExperiment`, `SingleCellExperiment`, `GenomicRanges`, `VariantAnnotation`, `AnnotationHub`, or `ExperimentHub`
- the user asks to set up Bioconductor locally

**Chaining partners** — this skill connects with:
- `rnaseq-de`: translate bulk RNA-seq tasks into Bioconductor-native package choices
- `scrna-orchestrator`: map Scanpy-style single-cell requests to Bioconductor equivalents
- `diff-visualizer`: suggest Bioconductor visualization/reporting packages
- `bio-orchestrator`: route package-selection and setup questions here first

## Citations

- [Bioconductor](https://www.bioconductor.org/) — official project and package ecosystem
- [BiocManager](https://bioconductor.org/install/) — official installation and version-management guidance
- [SummarizedExperiment](https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html) — canonical assay container
- [SingleCellExperiment](https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html) — canonical single-cell container
- [GenomicRanges](https://bioconductor.org/packages/release/bioc/html/GenomicRanges.html) — canonical interval container
- [VariantAnnotation](https://bioconductor.org/packages/release/bioc/html/VariantAnnotation.html) — canonical VCF and variant annotation package

Related Skills

illumina-bridge

658

from ClawBio/ClawBio

Import DRAGEN-exported Illumina result bundles into ClawBio for local tertiary analysis and downstream routing.

galaxy-bridge

658

from ClawBio/ClawBio

Galaxy tool discovery, intelligent recommendation, and execution — 8,000+ bioinformatics tools from usegalaxy.org with multi-signal scoring and workflow suggestions

wes-clinical-report-es

658

from ClawBio/ClawBio

Generates professional clinical PDF reports in Spanish from WES (Whole Exome Sequencing) data with clinical interpretation, pharmacogenomic alerts, and follow-up recommendations.

wes-clinical-report-en

658

from ClawBio/ClawBio

Generates professional clinical PDF reports in English from WES (Whole Exome Sequencing) data with clinical interpretation summary, pharmacogenomic alerts, and follow-up recommendations.

vcf-annotator

658

from ClawBio/ClawBio

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

variant-annotation

658

from ClawBio/ClawBio

Annotate VCF variants with Ensembl VEP REST, ClinVar significance, gnomAD/population frequency context, and prioritized variant ranking.

ukb-navigator

658

from ClawBio/ClawBio

Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.

target-validation-scorer

658

from ClawBio/ClawBio

Evidence-grounded target validation scoring with GO/NO-GO decisions for drug discovery campaigns

struct-predictor

658

from ClawBio/ClawBio

Protein structure prediction with Boltz-2. Accepts YAML inputs (single protein or multi-chain complex), runs boltz predict, extracts per-residue pLDDT and PAE confidence, and writes a markdown report with figures.

soul2dna

658

from ClawBio/ClawBio

Compile SOUL.md character profiles into synthetic diploid genomes (.genome.json) via trait-to-allele mapping

seq-wrangler

658

from ClawBio/ClawBio

Sequence QC, alignment, and BAM processing. Wraps FastQC, BWA/Bowtie2, SAMtools for automated read-to-BAM pipelines.

scrna-orchestrator

658

from ClawBio/ClawBio

Local Scanpy pipeline for single-cell RNA-seq QC, optional doublet detection, clustering, marker discovery, optional CellTypist annotation, optional latent downstream mode from integrated.h5ad/X_scvi, and optional dataset-level plus within-cluster contrastive marker analysis from raw-count .h5ad or 10x Matrix Market input.