bio-alignment-validation
Validate alignment quality with insert size distribution, proper pairing rates, GC bias, strand balance, and other post-alignment metrics. Use when verifying alignment data quality before variant calling or quantification.
Best use case
bio-alignment-validation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Validate alignment quality with insert size distribution, proper pairing rates, GC bias, strand balance, and other post-alignment metrics. Use when verifying alignment data quality before variant calling or quantification.
Teams using bio-alignment-validation should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/bio-alignment-validation/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How bio-alignment-validation Compares
| Feature / Agent | bio-alignment-validation | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Validate alignment quality with insert size distribution, proper pairing rates, GC bias, strand balance, and other post-alignment metrics. Use when verifying alignment data quality before variant calling or quantification.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Alignment Validation
Post-alignment quality control to verify alignment quality and identify issues.
## Insert Size Distribution
Insert size should match library preparation protocol.
### samtools stats
```bash
samtools stats input.bam > stats.txt
grep "^IS" stats.txt | cut -f2,3 > insert_sizes.txt
```
### Picard CollectInsertSizeMetrics
```bash
java -jar picard.jar CollectInsertSizeMetrics \
I=input.bam \
O=insert_metrics.txt \
H=insert_histogram.pdf
```
### Expected Insert Sizes
| Library Type | Expected Size |
|--------------|---------------|
| Standard WGS | 300-500 bp |
| PCR-free | 350-550 bp |
| RNA-seq | 150-300 bp |
| ChIP-seq | 150-300 bp |
| ATAC-seq | Multimodal |
### Python Insert Size Analysis
```python
import pysam
import numpy as np
import matplotlib.pyplot as plt
def get_insert_sizes(bam_file, max_reads=100000):
sizes = []
bam = pysam.AlignmentFile(bam_file, 'rb')
for i, read in enumerate(bam.fetch()):
if i >= max_reads:
break
if read.is_proper_pair and not read.is_secondary and read.template_length > 0:
sizes.append(read.template_length)
bam.close()
return sizes
sizes = get_insert_sizes('sample.bam')
print(f'Median insert size: {np.median(sizes):.0f}')
print(f'Mean insert size: {np.mean(sizes):.0f}')
print(f'Std dev: {np.std(sizes):.0f}')
plt.hist(sizes, bins=100, range=(0, 1000))
plt.xlabel('Insert Size')
plt.ylabel('Count')
plt.savefig('insert_size_dist.pdf')
```
## Proper Pairing Rate
Percentage of reads correctly paired.
### samtools flagstat
```bash
samtools flagstat input.bam
samtools flagstat input.bam | grep "properly paired"
```
### Calculate Pairing Rate
```bash
proper=$(samtools view -c -f 2 input.bam)
mapped=$(samtools view -c -F 4 input.bam)
rate=$(echo "scale=4; $proper / $mapped * 100" | bc)
echo "Proper pairing rate: ${rate}%"
```
### Expected Rates
| Metric | Good | Marginal | Poor |
|--------|------|----------|------|
| Proper pair | > 90% | 80-90% | < 80% |
| Mapped | > 95% | 90-95% | < 90% |
| Singletons | < 5% | 5-10% | > 10% |
## GC Bias
GC content correlation with coverage.
### Picard CollectGcBiasMetrics
```bash
java -jar picard.jar CollectGcBiasMetrics \
I=input.bam \
O=gc_bias_metrics.txt \
CHART=gc_bias_chart.pdf \
S=gc_summary.txt \
R=reference.fa
```
### deepTools computeGCBias
```bash
computeGCBias \
-b input.bam \
--effectiveGenomeSize 2913022398 \
-g hg38.2bit \
-o gc_bias.txt \
--biasPlot gc_bias.pdf
```
### Interpret GC Bias
| Issue | Symptom |
|-------|---------|
| Under-representation | Low GC coverage drops |
| Over-representation | High GC coverage elevated |
| PCR bias | Strong correlation |
## Strand Balance
Forward and reverse strand should be balanced.
### Calculate Strand Ratio
```bash
forward=$(samtools view -c -F 16 input.bam)
reverse=$(samtools view -c -f 16 input.bam)
echo "Forward: $forward"
echo "Reverse: $reverse"
ratio=$(echo "scale=4; $forward / $reverse" | bc)
echo "F/R ratio: $ratio"
```
### Check Strand Bias per Chromosome
```bash
for chr in chr1 chr2 chr3; do
fwd=$(samtools view -c -F 16 input.bam $chr)
rev=$(samtools view -c -f 16 input.bam $chr)
echo "$chr: F=$fwd R=$rev ratio=$(echo "scale=2; $fwd/$rev" | bc)"
done
```
## Mapping Quality Distribution
### Extract MAPQ Distribution
```bash
samtools view input.bam | cut -f5 | sort -n | uniq -c | sort -k2 -n
```
### Calculate Mean MAPQ
```bash
samtools view input.bam | awk '{sum+=$5; count++} END {print "Mean MAPQ:", sum/count}'
```
### MAPQ Thresholds
| MAPQ | Meaning |
|------|---------|
| 0 | Multi-mapper |
| 1-10 | Low confidence |
| 20-30 | Moderate |
| 40+ | High confidence |
| 60 | Unique (BWA) |
## Chromosome Coverage Balance
### Calculate Per-Chromosome Coverage
```bash
samtools idxstats input.bam | awk '{print $1, $3/$2}' | head -25
```
### Check for Aneuploidy/Contamination
```bash
samtools idxstats input.bam | awk '$3 > 0 {
sum += $3
len[$1] = $2
reads[$1] = $3
} END {
for (chr in reads) {
expected = len[chr] / sum * reads[chr]
ratio = reads[chr] / expected
if (ratio < 0.8 || ratio > 1.2) print chr, ratio
}
}'
```
## Mismatch Rate
### Picard CollectAlignmentSummaryMetrics
```bash
java -jar picard.jar CollectAlignmentSummaryMetrics \
I=input.bam \
R=reference.fa \
O=alignment_summary.txt
```
### Key Metrics
| Metric | Description | Good Value |
|--------|-------------|------------|
| PCT_PF_READS_ALIGNED | Mapped % | > 95% |
| PF_MISMATCH_RATE | Mismatches | < 1% |
| PF_INDEL_RATE | Indels | < 0.1% |
| STRAND_BALANCE | Strand ratio | ~0.5 |
## Comprehensive Validation Script
```bash
#!/bin/bash
BAM=$1
REF=$2
NAME=$(basename $BAM .bam)
OUTDIR=${3:-qc}
mkdir -p $OUTDIR
echo "=== Alignment Validation: $NAME ===" | tee $OUTDIR/report.txt
echo -e "\n--- Flagstat ---" | tee -a $OUTDIR/report.txt
samtools flagstat $BAM | tee -a $OUTDIR/report.txt
echo -e "\n--- Mapping Rate ---" | tee -a $OUTDIR/report.txt
mapped=$(samtools view -c -F 4 $BAM)
total=$(samtools view -c $BAM)
rate=$(echo "scale=2; $mapped / $total * 100" | bc)
echo "Mapping rate: ${rate}%" | tee -a $OUTDIR/report.txt
echo -e "\n--- Proper Pairing ---" | tee -a $OUTDIR/report.txt
proper=$(samtools view -c -f 2 $BAM)
pair_rate=$(echo "scale=2; $proper / $mapped * 100" | bc)
echo "Proper pairing: ${pair_rate}%" | tee -a $OUTDIR/report.txt
echo -e "\n--- Insert Size ---" | tee -a $OUTDIR/report.txt
samtools stats $BAM | grep "insert size average" | tee -a $OUTDIR/report.txt
echo -e "\n--- Strand Balance ---" | tee -a $OUTDIR/report.txt
fwd=$(samtools view -c -F 16 $BAM)
rev=$(samtools view -c -f 16 $BAM)
strand_ratio=$(echo "scale=3; $fwd / $rev" | bc)
echo "Forward: $fwd, Reverse: $rev, Ratio: $strand_ratio" | tee -a $OUTDIR/report.txt
echo -e "\n--- Chromosome Coverage ---" | tee -a $OUTDIR/report.txt
samtools idxstats $BAM | head -25 | tee -a $OUTDIR/report.txt
echo -e "\nReport: $OUTDIR/report.txt"
```
## Python Validation Module
```python
import pysam
import numpy as np
from collections import Counter
class AlignmentValidator:
def __init__(self, bam_file):
self.bam = pysam.AlignmentFile(bam_file, 'rb')
def mapping_rate(self):
stats = self.bam.get_index_statistics()
mapped = sum(s.mapped for s in stats)
unmapped = sum(s.unmapped for s in stats)
return mapped / (mapped + unmapped) * 100
def proper_pair_rate(self, sample_size=100000):
proper = 0
paired = 0
for i, read in enumerate(self.bam.fetch()):
if i >= sample_size:
break
if read.is_paired:
paired += 1
if read.is_proper_pair:
proper += 1
return proper / paired * 100 if paired > 0 else 0
def mapq_distribution(self, sample_size=100000):
mapqs = []
for i, read in enumerate(self.bam.fetch()):
if i >= sample_size:
break
if not read.is_unmapped:
mapqs.append(read.mapping_quality)
return Counter(mapqs)
def strand_balance(self, sample_size=100000):
forward = 0
reverse = 0
for i, read in enumerate(self.bam.fetch()):
if i >= sample_size:
break
if not read.is_unmapped:
if read.is_reverse:
reverse += 1
else:
forward += 1
return forward / (forward + reverse) if (forward + reverse) > 0 else 0.5
def report(self):
print(f'Mapping rate: {self.mapping_rate():.1f}%')
print(f'Proper pairing: {self.proper_pair_rate():.1f}%')
print(f'Strand balance: {self.strand_balance():.3f}')
mapq_dist = self.mapq_distribution()
high_qual = sum(v for k, v in mapq_dist.items() if k >= 30)
total = sum(mapq_dist.values())
print(f'High MAPQ (>=30): {high_qual/total*100:.1f}%')
def close(self):
self.bam.close()
validator = AlignmentValidator('sample.bam')
validator.report()
validator.close()
```
## Quality Thresholds Summary
| Metric | Good | Warning | Fail |
|--------|------|---------|------|
| Mapping rate | > 95% | 90-95% | < 90% |
| Proper pairing | > 90% | 80-90% | < 80% |
| Duplicate rate | < 10% | 10-20% | > 20% |
| Strand balance | 0.48-0.52 | 0.45-0.55 | Outside |
| Mean MAPQ | > 40 | 30-40 | < 30 |
| GC bias | < 1.2x | 1.2-1.5x | > 1.5x |
## Related Skills
- bam-statistics - Basic flagstat and depth
- duplicate-handling - Mark/remove duplicates
- alignment-filtering - Filter low-quality reads
- chipseq-qc - ChIP-specific metricsRelated Skills
date-validation
Use when editing Planning Hubs, timelines, calendars, or any file with day-name + date combinations (Wed Nov 12), relative dates (tomorrow), or countdowns (18 days until) - validates day-of-week accuracy, relative date calculations, and countdown math with two-source ground truth verification before allowing edits
spring-validation
Bean Validation (Jakarta Validation) with Spring Boot. Custom validators, validation groups, cross-field validation, and internationalized error messages.
fullstack-validation
Comprehensive validation methodology for multi-component applications including backend, frontend, database, and infrastructure
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
api-validation
Apply when validating API request inputs: body, query params, path params, and headers. This skill covers Zod v4 patterns.
api-request-validation
A skill for implementing robust API request validation in Python web frameworks like FastAPI using Pydantic. Covers Pydantic models, custom validators (email, password), field-level and cross-field validation, query/file validation, and structured error responses. Use when you need to validate incoming API requests.
api-contracts-and-zod-validation
Generate Zod schemas and TypeScript types for forms, API routes, and Server Actions with runtime validation. Use this skill when creating API contracts, validating request/response payloads, generating form schemas, adding input validation to Server Actions or route handlers, or ensuring type safety across client-server boundaries. Trigger terms include zod, schema, validation, API contract, form validation, type inference, runtime validation, parse, safeParse, input validation, request validation, Server Action validation.
api-contracts-and-validation
Define and validate API contracts using Zod
api-contract-validation
Detect breaking changes in API contracts (OpenAPI/Swagger specs)
android-playstore-api-validation
Create and run validation script to test Play Store API connection
agent-ops-validation
Pre-commit and pre-merge validation checks. Use before committing changes or declaring work complete to ensure all quality gates pass.
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.