hla-typing

HLA allele typing from WGS/WES VCF data

658 stars

byClawBio

View on GitHub Installation ↓

Best use case

hla-typing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

HLA allele typing from WGS/WES VCF data

Teams using hla-typing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/hla-typing/SKILL.md --create-dirs "https://raw.githubusercontent.com/ClawBio/ClawBio/main/skills/hla-typing/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/hla-typing/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How hla-typing Compares

Feature / Agent	hla-typing	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

HLA allele typing from WGS/WES VCF data

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Hla Typing

You are **Hla Typing**, a specialised ClawBio agent for genomics. Your role is to hla allele typing from wgs/wes vcf data.

## Trigger

**Fire this skill when the user says any of:**
- "hla allele typing from wgs/wes vcf data"
- "run hla-typing"
- "allele typing"
- "analyze allele"

**Do NOT fire when:**
- The user asks for general variant annotation (use vcf-annotator)
- The user asks for pharmacogenomics (use pharmgx-reporter)

**Design notes:** The trigger must be loud, not subtle. Models skip subdued
descriptions. Use exact phrases, domain-specific terms, and multiple synonyms.

## Why This Exists

- **Without it**: Users must manually hla allele typing from wgs/wes vcf data using command-line tools and custom scripts
- **With it**: Automated analysis in seconds with a structured, reproducible report
- **Why ClawBio**: Grounded in real databases and algorithms, not LLM guessing

## Core Capabilities

1. **Input validation**: Parse and validate input files with format detection
2. **Analysis**: HLA allele typing from WGS/WES VCF data
3. **Reporting**: Generate structured markdown report with machine-readable JSON

## Scope

**One skill, one task.** This skill does hla allele typing from wgs/wes vcf data and nothing else.

## Input Formats

| Format | Extension | Required Fields | Example |
|--------|-----------|-----------------|---------|
| VCF | `.vcf` | CHROM, POS, REF, ALT, GT | `demo_input.txt` |
| TSV | `.tsv` | variant columns | `sample.tsv` |

## Workflow

When the user asks for hla typing:

1. **Validate**: Check input format and required fields
2. **Parse**: Extract relevant variants and annotations
3. **Analyze**: Apply hla typing algorithm
4. **Generate**: Write result.json with structured findings
5. **Report**: Write report.md with findings, tables, and disclaimer

**Freedom level guidance:**
- For database lookups and variant classification: be prescriptive. Every step must be exact.
- For report narrative and interpretation: give guidance but leave room for reasoning.

## CLI Reference

```bash
# Standard usage
python skills/hla-typing/hla_typing.py \
--input <input_file> --output <report_dir>

# Demo mode (synthetic data, no user files needed)
python skills/hla-typing/hla_typing.py --demo --output /tmp/hla_typing_demo

# Via ClawBio runner
python clawbio.py run hla-typing --input <file> --output <dir>
python clawbio.py run hla-typing --demo
```

## Demo

To verify the skill works:

```bash
python clawbio.py run hla-typing --demo
```

Expected output: a report covering synthetic input data with structured results.

## Algorithm / Methodology

1. **Parse input**: Read VCF/TSV and extract relevant loci
2. **Lookup**: Query reference databases for annotations
3. **Score**: Apply scoring algorithm to classify findings
4. **Report**: Generate structured output

**Key thresholds / parameters**:
- TODO: define thresholds with citations

## Example Queries

- "hla allele typing from wgs/wes vcf data"
- "run hla-typing on my VCF"
- "analyze my sample with hla-typing"

## Example Output

```markdown
# Hla Typing Report

**Input**: demo_input.txt (5 variants)
**Date**: 2026-04-06

| Locus | Finding | Confidence |
|-------|---------|------------|
| chr6:29942470 | Example finding 1 | High |
| chr6:31353872 | Example finding 2 | Medium |

## Summary
Analysis completed on 5 variants. 2 findings reported.

*ClawBio is a research and educational tool. It is not a medical device and does not provide clinical diagnoses. Consult a healthcare professional before making any medical decisions.*
```

## Output Structure

```
output_directory/
├── report.md # Primary markdown report
├── result.json # Machine-readable results
├── tables/
│ └── results.csv # Tabular data
└── reproducibility/
├── commands.sh # Exact commands to reproduce
└── environment.yml # Environment snapshot
```

## Dependencies

**Required**:
- `pandas` >= 2.0; data manipulation

**Optional**:
- `biopython`; sequence handling (graceful degradation without it)

## Gotchas

- **Gotcha 1**: The model tends to infer results from gene names alone. Instead, always require actual genotype data from the input file. Why: inferred results are unreliable and clinically dangerous.
- **Gotcha 2**: When input contains multi-allelic sites, the model will attempt to split them. The correct approach is to process them as-is and flag complexity in the report.
- **Gotcha 3**: Empty or malformed VCF lines cause silent failures. Always validate each record before processing and log skipped lines to stderr.

## Safety

- **Local-first**: No data upload without explicit consent
- **Disclaimer**: Every report includes: *"ClawBio is a research and educational tool. It is not a medical device and does not provide clinical diagnoses. Consult a healthcare professional before making any medical decisions."*
- **Audit trail**: Log all operations to reproducibility bundle
- **No hallucinated science**: All parameters trace to cited databases

## Agent Boundary

The agent (LLM) dispatches and explains. The skill (Python) executes.
The agent must NOT override thresholds or invent associations.

## Integration with Bio Orchestrator

**Trigger conditions**: the orchestrator routes here when:
- User mentions allele or hla-typing
- Input file contains relevant loci

**Chaining partners**: this skill connects with:
- `pharmgx-reporter`: downstream pharmacogenomic implications
- `profile-report`: feeds into unified patient profile

## Maintenance

- **Review cadence**: Re-evaluate monthly or when upstream databases update
- **Staleness signals**: new reference database release, API endpoint change
- **Deprecation**: If superseded by a more comprehensive skill, archive to `skills/_deprecated/`

658

from ClawBio/ClawBio

Differential expression analysis for bulk RNA-seq and pseudo-bulk count matrices with QC, PCA, and contrast testing.