bio-alignment-indexing

Create and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.

16 stars

Best use case

bio-alignment-indexing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Create and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.

Teams using bio-alignment-indexing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-alignment-indexing/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/tools/bio-alignment-indexing/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/bio-alignment-indexing/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How bio-alignment-indexing Compares

Feature / Agentbio-alignment-indexingStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Create and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Alignment Indexing

Create indices for random access to alignment files using samtools and pysam.

## Index Types

| Index | Extension | Use Case |
|-------|-----------|----------|
| BAI | `.bai` | Standard BAM index, chromosomes < 512 Mbp |
| CSI | `.csi` | Large chromosomes, custom bin sizes |
| CRAI | `.crai` | CRAM index |

## samtools index

### Create BAI Index
```bash
samtools index input.bam
# Creates input.bam.bai
```

### Create CSI Index
```bash
samtools index -c input.bam
# Creates input.bam.csi
```

### Specify Output Name
```bash
samtools index input.bam output.bai
```

### Multi-threaded Indexing
```bash
samtools index -@ 4 input.bam
```

### Index CRAM
```bash
samtools index input.cram
# Creates input.cram.crai
```

## Index Requirements

Indexing requires coordinate-sorted files:
```bash
# Check sort order
samtools view -H input.bam | grep "^@HD"
# Should show SO:coordinate

# Sort if needed, then index
samtools sort -o sorted.bam input.bam
samtools index sorted.bam
```

## Using Indices for Region Access

### samtools view with Region
```bash
# Requires index file present
samtools view input.bam chr1:1000000-2000000
```

### Multiple Regions
```bash
samtools view input.bam chr1:1000-2000 chr2:3000-4000
```

### Regions from BED File
```bash
samtools view -L regions.bed input.bam
```

## pysam Python Alternative

### Create Index
```python
import pysam

pysam.index('input.bam')
# Creates input.bam.bai
```

### Create CSI Index
```python
pysam.index('input.bam', 'input.bam.csi', csi=True)
```

### Fetch with Index
```python
with pysam.AlignmentFile('input.bam', 'rb') as bam:
    # fetch() requires index
    for read in bam.fetch('chr1', 1000000, 2000000):
        print(read.query_name)
```

### Check if Indexed
```python
import pysam
from pathlib import Path

def is_indexed(bam_path):
    bam_path = Path(bam_path)
    return (bam_path.with_suffix('.bam.bai').exists() or
            Path(str(bam_path) + '.bai').exists() or
            bam_path.with_suffix('.bam.csi').exists())

if not is_indexed('input.bam'):
    pysam.index('input.bam')
```

### Fetch Multiple Regions
```python
regions = [('chr1', 1000, 2000), ('chr1', 5000, 6000), ('chr2', 1000, 2000)]

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for chrom, start, end in regions:
        count = sum(1 for _ in bam.fetch(chrom, start, end))
        print(f'{chrom}:{start}-{end}: {count} reads')
```

### Count Reads in Region
```python
with pysam.AlignmentFile('input.bam', 'rb') as bam:
    count = bam.count('chr1', 1000000, 2000000)
    print(f'Reads in region: {count}')
```

### Get Reads Covering Position
```python
with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for read in bam.fetch('chr1', 1000000, 1000001):
        if read.reference_start <= 1000000 < read.reference_end:
            print(f'{read.query_name} covers position 1000000')
```

## Index File Locations

samtools looks for indices in two locations:
```
input.bam.bai   # Standard location
input.bai       # Alternative location
```

For CRAM:
```
input.cram.crai
```

## idxstats - Index Statistics

### Get Per-Chromosome Counts
```bash
samtools idxstats input.bam
```

Output format:
```
chr1    248956422    5000000    0
chr2    242193529    4500000    0
*       0            0          10000
```

Columns: reference name, length, mapped reads, unmapped reads

### Sum Total Mapped Reads
```bash
samtools idxstats input.bam | awk '{sum += $3} END {print sum}'
```

### pysam idxstats
```python
with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for stat in bam.get_index_statistics():
        print(f'{stat.contig}: {stat.mapped} mapped, {stat.unmapped} unmapped')
```

## FASTA Index (faidx)

Related but different - index reference FASTA for random access:

```bash
samtools faidx reference.fa
# Creates reference.fa.fai

# Fetch region from indexed FASTA
samtools faidx reference.fa chr1:1000-2000
```

### pysam FastaFile
```python
with pysam.FastaFile('reference.fa') as ref:
    seq = ref.fetch('chr1', 1000, 2000)
    print(seq)
```

## Quick Reference

| Task | samtools | pysam |
|------|----------|-------|
| Create BAI | `samtools index file.bam` | `pysam.index('file.bam')` |
| Create CSI | `samtools index -c file.bam` | `pysam.index('file.bam', csi=True)` |
| Fetch region | `samtools view file.bam chr1:1-1000` | `bam.fetch('chr1', 0, 1000)` |
| Count in region | `samtools view -c file.bam chr1:1-1000` | `bam.count('chr1', 0, 1000)` |
| Index stats | `samtools idxstats file.bam` | `bam.get_index_statistics()` |
| Index FASTA | `samtools faidx ref.fa` | Automatic with FastaFile |

## Common Errors

| Error | Cause | Solution |
|-------|-------|----------|
| `random alignment retrieval only works for indexed BAM` | Missing index | Run `samtools index file.bam` |
| `file is not sorted` | Unsorted BAM | Sort first with `samtools sort` |
| `chromosome not found` | Wrong chromosome name | Check names with `samtools view -H` |

## Related Skills

- sam-bam-basics - View and convert alignment files
- alignment-sorting - Sort BAM files (required before indexing)
- alignment-filtering - Filter by regions using index
- bam-statistics - Use idxstats for quick counts
- sequence-io/read-sequences - Index FASTA with SeqIO.index_db()

Related Skills

bio-alignment-filtering

16
from diegosouzapw/awesome-omni-skill

Filter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.

alignment-values-north-star

16
from diegosouzapw/awesome-omni-skill

Use when teams need shared direction and decision-making alignment. Invoke when starting new teams, scaling organizations, defining culture, establishing product vision, resolving misalignment, creating strategic clarity, or setting behavioral standards. Use when user mentions North Star, team values, mission, principles, guardrails, decision framework, or cultural alignment.

alignment-level-QC

16
from diegosouzapw/awesome-omni-skill

Calculates technical mapping statistics for any aligned BAM file (ChIP or ATAC). It assesses the performance of the aligner itself by generating metrics on read depth, mapping quality, error rates, and read group data using samtools and Picard.Use this skill to check "how well the reads mapped" or to validate BAM formatting/sorting before further processing. Do NOT use this skill for biological signal validation (like checking for peaks or open chromatin) or for filtering/removing reads.

bio-alignment-validation

16
from diegosouzapw/awesome-omni-skill

Validate alignment quality with insert size distribution, proper pairing rates, GC bias, strand balance, and other post-alignment metrics. Use when verifying alignment data quality before variant calling or quantification.

bio-alignment-io

16
from diegosouzapw/awesome-omni-skill

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

bgo

16
from diegosouzapw/awesome-omni-skill

Automated Blender build-go workflow. Automatically builds, removes old version, installs, enables, and launches Blender with your extension/add-on. Use when you want to quickly test changes, execute complete build-to-launch cycle, or run custom packaging scripts with automatic Blender launch.

Coding & Development

deleting-op-secrets

16
from diegosouzapw/awesome-omni-skill

Deletes or archives secrets in 1Password using the op CLI. Use when the user needs to permanently remove items, archive deprecated credentials, or clean up unused secrets from 1Password vaults. Supports both permanent deletion and archiving for later recovery.

definition.tech_spike

16
from diegosouzapw/awesome-omni-skill

Scope and prioritize technical spikes that de-risk architecture or implementation questions.

defi-protocol-templates

16
from diegosouzapw/awesome-omni-skill

Implement DeFi protocols with production-ready templates for staking, AMMs, governance, and lending systems. Use when building decentralized finance applications or smart contract protocols.

deepgram-automation

16
from diegosouzapw/awesome-omni-skill

Automate Deepgram tasks via Rube MCP (Composio). Always search tools first for current schemas.

deel-automation

16
from diegosouzapw/awesome-omni-skill

Automate Deel tasks via Rube MCP (Composio). Always search tools first for current schemas.

declarative-agents-microsoft365

16
from diegosouzapw/awesome-omni-skill

Comprehensive development guidelines for Microsoft 365 Copilot declarative agents with schema v1.5, TypeSpec integration, and Microsoft 365 Agents Toolkit workflows Triggers on: **.json, **.ts, **.tsp, **manifest.json, **agent.json, **declarative-agent.json