illumina-bridge
Import DRAGEN-exported Illumina result bundles into ClawBio for local tertiary analysis and downstream routing.
Best use case
illumina-bridge is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Import DRAGEN-exported Illumina result bundles into ClawBio for local tertiary analysis and downstream routing.
Teams using illumina-bridge should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/illumina-bridge/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How illumina-bridge Compares
| Feature / Agent | illumina-bridge | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Import DRAGEN-exported Illumina result bundles into ClawBio for local tertiary analysis and downstream routing.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Illumina Bridge
You are **Illumina Bridge**, a specialised ClawBio agent for importing Illumina/DRAGEN result bundles into the local-first ClawBio ecosystem.
## Why This Exists
Illumina platforms and DRAGEN generate strong secondary-analysis outputs, but teams still need a clean handoff into tertiary interpretation, reporting, and reproducible local workflows.
- **Without it**: users manually gather VCFs, SampleSheets, and QC files, then explain downstream steps by hand.
- **With it**: ClawBio imports the bundle, normalizes metadata, writes a local report, and suggests the next skill to run.
- **Why ClawBio**: the adapter keeps genomic payloads local while making Illumina exports immediately useful to downstream agent workflows.
## Core Capabilities
1. **Bundle discovery**: Detect `VCF + SampleSheet + QC metrics` inside a DRAGEN-style export folder.
2. **Metadata normalization**: Parse SampleSheet rows into a stable sample manifest and summarize QC metrics.
3. **Optional ICA enrichment**: Add project/run/sample metadata through a metadata-only Illumina Connected Analytics lookup.
4. **ClawBio handoff**: Write `report.md`, `result.json`, `tables/sample_manifest.csv`, and reproducibility artifacts with downstream routing hints.
## Input Formats
| Format | Extension | Required Fields | Example |
|--------|-----------|-----------------|---------|
| DRAGEN bundle directory | directory | `SampleSheet.csv`, one `*.vcf`/`*.vcf.gz`, one QC file | `demo_bundle/` |
| SampleSheet | `.csv` | `[Data]`, `[BCLConvert_Data]`, or `[Cloud_TSO500S_Data]` section with `Sample_ID` | `SampleSheet.csv` |
| QC metrics | `.json`, `.csv`, `.tsv` | run and quality summary metrics | `qc_metrics.json`, `MetricsOutput.tsv` |
## Workflow
1. **Discover**: Find the primary VCF, SampleSheet, and QC metrics inside the bundle.
2. **Parse**: Normalize sample rows and QC metrics into stable report-friendly shapes.
3. **Enrich**: Optionally request metadata-only ICA context using project and run IDs.
4. **Emit**: Write the local ClawBio import report, machine-readable manifest, sample table, and reproducibility bundle.
## CLI Reference
```bash
# Standard usage
python skills/illumina-bridge/illumina_bridge.py \
--input <bundle_dir> --output <report_dir>
# With optional ICA metadata enrichment
python skills/illumina-bridge/illumina_bridge.py \
--input <bundle_dir> \
--metadata-provider ica \
--ica-project-id <project_id> \
--ica-run-id <run_id> \
--output <report_dir>
# Demo mode
python skills/illumina-bridge/illumina_bridge.py --demo --output /tmp/illumina_demo
# Via ClawBio runner
python clawbio.py run illumina --input <bundle_dir> --output <dir>
python clawbio.py run illumina --demo
```
## Demo
```bash
python clawbio.py run illumina --demo
```
Expected output: a synthetic DRAGEN import with sample manifest, QC summary, result envelope, and recommended downstream ClawBio steps.
## Algorithm / Methodology
1. **Directory scan**: Prefer explicit overrides when present; otherwise auto-discover the primary result VCF, SampleSheet, and QC file using deterministic pattern order and a preference for `Results/*hard-filtered.vcf`.
2. **SampleSheet parsing**: Read and merge sample rows from `[Data]`, `[BCLConvert_Data]`, and `[Cloud_TSO500S_Data]` when present, normalizing `Sample_ID`, `Sample_Name`, `Sample_Project`, `Sample_Type`, `Lane`, `index`, and `index2`.
3. **QC normalization**: Accept JSON, CSV, or DRAGEN `MetricsOutput.tsv` files and map common Illumina/DRAGEN metric aliases into stable report keys such as `run_id`, `analysis_software`, `workflow_version`, `yield_gb`, and `percent_q30`.
4. **Metadata-only enrichment**: If ICA is enabled, request project and analysis metadata using the API key from the environment and merge sample-level metadata when available.
5. **Output contract**: Emit report, manifest, and reproducibility artifacts without launching downstream skills automatically.
## Example Queries
- "Import this DRAGEN export from Illumina and tell me what I can do next"
- "Read this SampleSheet and VCF bundle from DRAGEN"
- "Add ICA project metadata to this Illumina bundle"
## Output Structure
```
output_directory/
├── report.md
├── result.json
├── tables/
│ └── sample_manifest.csv
└── reproducibility/
├── commands.sh
├── environment.yml
└── checksums.sha256
```
## Dependencies
**Required**:
- `requests` — optional ICA metadata lookup
**Optional**:
- `ILLUMINA_ICA_API_KEY` — enables metadata-only ICA enrichment
- `ILLUMINA_ICA_BASE_URL` — override the ICA API root with a trusted `https://*.illumina.com` endpoint if needed
## Safety
- **Local-first**: genomic files are read locally; the skill never uploads VCF payloads
- **Metadata-only cloud access**: ICA enrichment is opt-in and limited to project/run metadata
- **Disclaimer**: every report includes the ClawBio medical disclaimer
- **Reproducibility**: commands, environment context, and checksums are always written
## Integration with Bio Orchestrator
**Trigger conditions**:
- queries mentioning Illumina, DRAGEN, ICA, BaseSpace, SampleSheet, or sample sheet
- directories that contain a recognizable Illumina bundle (`SampleSheet + VCF`)
**Chaining partners**:
- `equity-scorer`: cohort-level follow-up on imported VCFs
- `clinpgx`: targeted gene-drug follow-up after DRAGEN review
- `gwas-lookup`: per-variant external lookup from imported findings
## Citations
- [DRAGEN secondary analysis](https://www.illumina.com/products/by-type/informatics-products/dragen-secondary-analysis.html)
- [Illumina Connected Analytics](https://www.illumina.com/products/by-type/informatics-products/connected-analytics.html)
- [BCL Convert Sample Sheet](https://support-docs.illumina.com/SW/BCL_Convert/Content/SW/BCLConvert/SampleSheets_swBCL.htm)Related Skills
galaxy-bridge
Galaxy tool discovery, intelligent recommendation, and execution — 8,000+ bioinformatics tools from usegalaxy.org with multi-signal scoring and workflow suggestions
bioconductor-bridge
Bioconductor package discovery, workflow recommendation, setup inspection, and starter code generation grounded in official Bioconductor containers and BiocManager.
wes-clinical-report-es
Generates professional clinical PDF reports in Spanish from WES (Whole Exome Sequencing) data with clinical interpretation, pharmacogenomic alerts, and follow-up recommendations.
wes-clinical-report-en
Generates professional clinical PDF reports in English from WES (Whole Exome Sequencing) data with clinical interpretation summary, pharmacogenomic alerts, and follow-up recommendations.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
variant-annotation
Annotate VCF variants with Ensembl VEP REST, ClinVar significance, gnomAD/population frequency context, and prioritized variant ranking.
ukb-navigator
Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.
target-validation-scorer
Evidence-grounded target validation scoring with GO/NO-GO decisions for drug discovery campaigns
struct-predictor
Protein structure prediction with Boltz-2. Accepts YAML inputs (single protein or multi-chain complex), runs boltz predict, extracts per-residue pLDDT and PAE confidence, and writes a markdown report with figures.
soul2dna
Compile SOUL.md character profiles into synthetic diploid genomes (.genome.json) via trait-to-allele mapping
seq-wrangler
Sequence QC, alignment, and BAM processing. Wraps FastQC, BWA/Bowtie2, SAMtools for automated read-to-BAM pipelines.
scrna-orchestrator
Local Scanpy pipeline for single-cell RNA-seq QC, optional doublet detection, clustering, marker discovery, optional CellTypist annotation, optional latent downstream mode from integrated.h5ad/X_scvi, and optional dataset-level plus within-cluster contrastive marker analysis from raw-count .h5ad or 10x Matrix Market input.