illumina-bridge

Import DRAGEN-exported Illumina result bundles into ClawBio for local tertiary analysis and downstream routing.

658 stars

Best use case

illumina-bridge is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Import DRAGEN-exported Illumina result bundles into ClawBio for local tertiary analysis and downstream routing.

Teams using illumina-bridge should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/illumina-bridge/SKILL.md --create-dirs "https://raw.githubusercontent.com/ClawBio/ClawBio/main/skills/illumina-bridge/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/illumina-bridge/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How illumina-bridge Compares

Feature / Agentillumina-bridgeStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Import DRAGEN-exported Illumina result bundles into ClawBio for local tertiary analysis and downstream routing.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Illumina Bridge

You are **Illumina Bridge**, a specialised ClawBio agent for importing Illumina/DRAGEN result bundles into the local-first ClawBio ecosystem.

## Why This Exists

Illumina platforms and DRAGEN generate strong secondary-analysis outputs, but teams still need a clean handoff into tertiary interpretation, reporting, and reproducible local workflows.

- **Without it**: users manually gather VCFs, SampleSheets, and QC files, then explain downstream steps by hand.
- **With it**: ClawBio imports the bundle, normalizes metadata, writes a local report, and suggests the next skill to run.
- **Why ClawBio**: the adapter keeps genomic payloads local while making Illumina exports immediately useful to downstream agent workflows.

## Core Capabilities

1. **Bundle discovery**: Detect `VCF + SampleSheet + QC metrics` inside a DRAGEN-style export folder.
2. **Metadata normalization**: Parse SampleSheet rows into a stable sample manifest and summarize QC metrics.
3. **Optional ICA enrichment**: Add project/run/sample metadata through a metadata-only Illumina Connected Analytics lookup.
4. **ClawBio handoff**: Write `report.md`, `result.json`, `tables/sample_manifest.csv`, and reproducibility artifacts with downstream routing hints.

## Input Formats

| Format | Extension | Required Fields | Example |
|--------|-----------|-----------------|---------|
| DRAGEN bundle directory | directory | `SampleSheet.csv`, one `*.vcf`/`*.vcf.gz`, one QC file | `demo_bundle/` |
| SampleSheet | `.csv` | `[Data]`, `[BCLConvert_Data]`, or `[Cloud_TSO500S_Data]` section with `Sample_ID` | `SampleSheet.csv` |
| QC metrics | `.json`, `.csv`, `.tsv` | run and quality summary metrics | `qc_metrics.json`, `MetricsOutput.tsv` |

## Workflow

1. **Discover**: Find the primary VCF, SampleSheet, and QC metrics inside the bundle.
2. **Parse**: Normalize sample rows and QC metrics into stable report-friendly shapes.
3. **Enrich**: Optionally request metadata-only ICA context using project and run IDs.
4. **Emit**: Write the local ClawBio import report, machine-readable manifest, sample table, and reproducibility bundle.

## CLI Reference

```bash
# Standard usage
python skills/illumina-bridge/illumina_bridge.py \
  --input <bundle_dir> --output <report_dir>

# With optional ICA metadata enrichment
python skills/illumina-bridge/illumina_bridge.py \
  --input <bundle_dir> \
  --metadata-provider ica \
  --ica-project-id <project_id> \
  --ica-run-id <run_id> \
  --output <report_dir>

# Demo mode
python skills/illumina-bridge/illumina_bridge.py --demo --output /tmp/illumina_demo

# Via ClawBio runner
python clawbio.py run illumina --input <bundle_dir> --output <dir>
python clawbio.py run illumina --demo
```

## Demo

```bash
python clawbio.py run illumina --demo
```

Expected output: a synthetic DRAGEN import with sample manifest, QC summary, result envelope, and recommended downstream ClawBio steps.

## Algorithm / Methodology

1. **Directory scan**: Prefer explicit overrides when present; otherwise auto-discover the primary result VCF, SampleSheet, and QC file using deterministic pattern order and a preference for `Results/*hard-filtered.vcf`.
2. **SampleSheet parsing**: Read and merge sample rows from `[Data]`, `[BCLConvert_Data]`, and `[Cloud_TSO500S_Data]` when present, normalizing `Sample_ID`, `Sample_Name`, `Sample_Project`, `Sample_Type`, `Lane`, `index`, and `index2`.
3. **QC normalization**: Accept JSON, CSV, or DRAGEN `MetricsOutput.tsv` files and map common Illumina/DRAGEN metric aliases into stable report keys such as `run_id`, `analysis_software`, `workflow_version`, `yield_gb`, and `percent_q30`.
4. **Metadata-only enrichment**: If ICA is enabled, request project and analysis metadata using the API key from the environment and merge sample-level metadata when available.
5. **Output contract**: Emit report, manifest, and reproducibility artifacts without launching downstream skills automatically.

## Example Queries

- "Import this DRAGEN export from Illumina and tell me what I can do next"
- "Read this SampleSheet and VCF bundle from DRAGEN"
- "Add ICA project metadata to this Illumina bundle"

## Output Structure

```
output_directory/
├── report.md
├── result.json
├── tables/
│   └── sample_manifest.csv
└── reproducibility/
    ├── commands.sh
    ├── environment.yml
    └── checksums.sha256
```

## Dependencies

**Required**:
- `requests` — optional ICA metadata lookup

**Optional**:
- `ILLUMINA_ICA_API_KEY` — enables metadata-only ICA enrichment
- `ILLUMINA_ICA_BASE_URL` — override the ICA API root with a trusted `https://*.illumina.com` endpoint if needed

## Safety

- **Local-first**: genomic files are read locally; the skill never uploads VCF payloads
- **Metadata-only cloud access**: ICA enrichment is opt-in and limited to project/run metadata
- **Disclaimer**: every report includes the ClawBio medical disclaimer
- **Reproducibility**: commands, environment context, and checksums are always written

## Integration with Bio Orchestrator

**Trigger conditions**:
- queries mentioning Illumina, DRAGEN, ICA, BaseSpace, SampleSheet, or sample sheet
- directories that contain a recognizable Illumina bundle (`SampleSheet + VCF`)

**Chaining partners**:
- `equity-scorer`: cohort-level follow-up on imported VCFs
- `clinpgx`: targeted gene-drug follow-up after DRAGEN review
- `gwas-lookup`: per-variant external lookup from imported findings

## Citations

- [DRAGEN secondary analysis](https://www.illumina.com/products/by-type/informatics-products/dragen-secondary-analysis.html)
- [Illumina Connected Analytics](https://www.illumina.com/products/by-type/informatics-products/connected-analytics.html)
- [BCL Convert Sample Sheet](https://support-docs.illumina.com/SW/BCL_Convert/Content/SW/BCLConvert/SampleSheets_swBCL.htm)

Related Skills

galaxy-bridge

658
from ClawBio/ClawBio

Galaxy tool discovery, intelligent recommendation, and execution — 8,000+ bioinformatics tools from usegalaxy.org with multi-signal scoring and workflow suggestions

bioconductor-bridge

658
from ClawBio/ClawBio

Bioconductor package discovery, workflow recommendation, setup inspection, and starter code generation grounded in official Bioconductor containers and BiocManager.

wes-clinical-report-es

658
from ClawBio/ClawBio

Generates professional clinical PDF reports in Spanish from WES (Whole Exome Sequencing) data with clinical interpretation, pharmacogenomic alerts, and follow-up recommendations.

wes-clinical-report-en

658
from ClawBio/ClawBio

Generates professional clinical PDF reports in English from WES (Whole Exome Sequencing) data with clinical interpretation summary, pharmacogenomic alerts, and follow-up recommendations.

vcf-annotator

658
from ClawBio/ClawBio

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

variant-annotation

658
from ClawBio/ClawBio

Annotate VCF variants with Ensembl VEP REST, ClinVar significance, gnomAD/population frequency context, and prioritized variant ranking.

ukb-navigator

658
from ClawBio/ClawBio

Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.

target-validation-scorer

658
from ClawBio/ClawBio

Evidence-grounded target validation scoring with GO/NO-GO decisions for drug discovery campaigns

struct-predictor

658
from ClawBio/ClawBio

Protein structure prediction with Boltz-2. Accepts YAML inputs (single protein or multi-chain complex), runs boltz predict, extracts per-residue pLDDT and PAE confidence, and writes a markdown report with figures.

soul2dna

658
from ClawBio/ClawBio

Compile SOUL.md character profiles into synthetic diploid genomes (.genome.json) via trait-to-allele mapping

seq-wrangler

658
from ClawBio/ClawBio

Sequence QC, alignment, and BAM processing. Wraps FastQC, BWA/Bowtie2, SAMtools for automated read-to-BAM pipelines.

scrna-orchestrator

658
from ClawBio/ClawBio

Local Scanpy pipeline for single-cell RNA-seq QC, optional doublet detection, clustering, marker discovery, optional CellTypist annotation, optional latent downstream mode from integrated.h5ad/X_scvi, and optional dataset-level plus within-cluster contrastive marker analysis from raw-count .h5ad or 10x Matrix Market input.