hla-typing

HLA allele typing from WGS/WES VCF data

658 stars

Best use case

hla-typing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

HLA allele typing from WGS/WES VCF data

Teams using hla-typing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/hla-typing/SKILL.md --create-dirs "https://raw.githubusercontent.com/ClawBio/ClawBio/main/skills/hla-typing/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/hla-typing/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How hla-typing Compares

Feature / Agenthla-typingStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

HLA allele typing from WGS/WES VCF data

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Hla Typing

You are **Hla Typing**, a specialised ClawBio agent for genomics. Your role is to hla allele typing from wgs/wes vcf data.

## Trigger

**Fire this skill when the user says any of:**
- "hla allele typing from wgs/wes vcf data"
- "run hla-typing"
- "allele typing"
- "analyze allele"

**Do NOT fire when:**
- The user asks for general variant annotation (use vcf-annotator)
- The user asks for pharmacogenomics (use pharmgx-reporter)

**Design notes:** The trigger must be loud, not subtle. Models skip subdued
descriptions. Use exact phrases, domain-specific terms, and multiple synonyms.

## Why This Exists

- **Without it**: Users must manually hla allele typing from wgs/wes vcf data using command-line tools and custom scripts
- **With it**: Automated analysis in seconds with a structured, reproducible report
- **Why ClawBio**: Grounded in real databases and algorithms, not LLM guessing

## Core Capabilities

1. **Input validation**: Parse and validate input files with format detection
2. **Analysis**: HLA allele typing from WGS/WES VCF data
3. **Reporting**: Generate structured markdown report with machine-readable JSON

## Scope

**One skill, one task.** This skill does hla allele typing from wgs/wes vcf data and nothing else.

## Input Formats

| Format | Extension | Required Fields | Example |
|--------|-----------|-----------------|---------|
| VCF | `.vcf` | CHROM, POS, REF, ALT, GT | `demo_input.txt` |
| TSV | `.tsv` | variant columns | `sample.tsv` |

## Workflow

When the user asks for hla typing:

1. **Validate**: Check input format and required fields
2. **Parse**: Extract relevant variants and annotations
3. **Analyze**: Apply hla typing algorithm
4. **Generate**: Write result.json with structured findings
5. **Report**: Write report.md with findings, tables, and disclaimer

**Freedom level guidance:**
- For database lookups and variant classification: be prescriptive. Every step must be exact.
- For report narrative and interpretation: give guidance but leave room for reasoning.

## CLI Reference

```bash
# Standard usage
python skills/hla-typing/hla_typing.py \
  --input <input_file> --output <report_dir>

# Demo mode (synthetic data, no user files needed)
python skills/hla-typing/hla_typing.py --demo --output /tmp/hla_typing_demo

# Via ClawBio runner
python clawbio.py run hla-typing --input <file> --output <dir>
python clawbio.py run hla-typing --demo
```

## Demo

To verify the skill works:

```bash
python clawbio.py run hla-typing --demo
```

Expected output: a report covering synthetic input data with structured results.

## Algorithm / Methodology

1. **Parse input**: Read VCF/TSV and extract relevant loci
2. **Lookup**: Query reference databases for annotations
3. **Score**: Apply scoring algorithm to classify findings
4. **Report**: Generate structured output

**Key thresholds / parameters**:
- TODO: define thresholds with citations

## Example Queries

- "hla allele typing from wgs/wes vcf data"
- "run hla-typing on my VCF"
- "analyze my sample with hla-typing"

## Example Output

```markdown
# Hla Typing Report

**Input**: demo_input.txt (5 variants)
**Date**: 2026-04-06

| Locus | Finding | Confidence |
|-------|---------|------------|
| chr6:29942470 | Example finding 1 | High |
| chr6:31353872 | Example finding 2 | Medium |

## Summary
Analysis completed on 5 variants. 2 findings reported.

*ClawBio is a research and educational tool. It is not a medical device and does not provide clinical diagnoses. Consult a healthcare professional before making any medical decisions.*
```

## Output Structure

```
output_directory/
├── report.md              # Primary markdown report
├── result.json            # Machine-readable results
├── tables/
│   └── results.csv        # Tabular data
└── reproducibility/
    ├── commands.sh         # Exact commands to reproduce
    └── environment.yml     # Environment snapshot
```

## Dependencies

**Required**:
- `pandas` >= 2.0; data manipulation

**Optional**:
- `biopython`; sequence handling (graceful degradation without it)

## Gotchas

- **Gotcha 1**: The model tends to infer results from gene names alone. Instead, always require actual genotype data from the input file. Why: inferred results are unreliable and clinically dangerous.
- **Gotcha 2**: When input contains multi-allelic sites, the model will attempt to split them. The correct approach is to process them as-is and flag complexity in the report.
- **Gotcha 3**: Empty or malformed VCF lines cause silent failures. Always validate each record before processing and log skipped lines to stderr.

## Safety

- **Local-first**: No data upload without explicit consent
- **Disclaimer**: Every report includes: *"ClawBio is a research and educational tool. It is not a medical device and does not provide clinical diagnoses. Consult a healthcare professional before making any medical decisions."*
- **Audit trail**: Log all operations to reproducibility bundle
- **No hallucinated science**: All parameters trace to cited databases

## Agent Boundary

The agent (LLM) dispatches and explains. The skill (Python) executes.
The agent must NOT override thresholds or invent associations.

## Integration with Bio Orchestrator

**Trigger conditions**: the orchestrator routes here when:
- User mentions allele or hla-typing
- Input file contains relevant loci

**Chaining partners**: this skill connects with:
- `pharmgx-reporter`: downstream pharmacogenomic implications
- `profile-report`: feeds into unified patient profile

## Maintenance

- **Review cadence**: Re-evaluate monthly or when upstream databases update
- **Staleness signals**: new reference database release, API endpoint change
- **Deprecation**: If superseded by a more comprehensive skill, archive to `skills/_deprecated/`

## Citations

- TODO: Add relevant database and paper citations

Related Skills

wes-clinical-report-es

658
from ClawBio/ClawBio

Generates professional clinical PDF reports in Spanish from WES (Whole Exome Sequencing) data with clinical interpretation, pharmacogenomic alerts, and follow-up recommendations.

wes-clinical-report-en

658
from ClawBio/ClawBio

Generates professional clinical PDF reports in English from WES (Whole Exome Sequencing) data with clinical interpretation summary, pharmacogenomic alerts, and follow-up recommendations.

vcf-annotator

658
from ClawBio/ClawBio

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

variant-annotation

658
from ClawBio/ClawBio

Annotate VCF variants with Ensembl VEP REST, ClinVar significance, gnomAD/population frequency context, and prioritized variant ranking.

ukb-navigator

658
from ClawBio/ClawBio

Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.

target-validation-scorer

658
from ClawBio/ClawBio

Evidence-grounded target validation scoring with GO/NO-GO decisions for drug discovery campaigns

struct-predictor

658
from ClawBio/ClawBio

Protein structure prediction with Boltz-2. Accepts YAML inputs (single protein or multi-chain complex), runs boltz predict, extracts per-residue pLDDT and PAE confidence, and writes a markdown report with figures.

soul2dna

658
from ClawBio/ClawBio

Compile SOUL.md character profiles into synthetic diploid genomes (.genome.json) via trait-to-allele mapping

seq-wrangler

658
from ClawBio/ClawBio

Sequence QC, alignment, and BAM processing. Wraps FastQC, BWA/Bowtie2, SAMtools for automated read-to-BAM pipelines.

scrna-orchestrator

658
from ClawBio/ClawBio

Local Scanpy pipeline for single-cell RNA-seq QC, optional doublet detection, clustering, marker discovery, optional CellTypist annotation, optional latent downstream mode from integrated.h5ad/X_scvi, and optional dataset-level plus within-cluster contrastive marker analysis from raw-count .h5ad or 10x Matrix Market input.

scrna-embedding

658
from ClawBio/ClawBio

Local scVI/scANVI-based single-cell latent embedding and batch-aware integration from raw-count .h5ad or 10x Matrix Market input, with stable integrated AnnData export for downstream latent analysis.

rnaseq-de

658
from ClawBio/ClawBio

Differential expression analysis for bulk RNA-seq and pseudo-bulk count matrices with QC, PCA, and contrast testing.