bio-atac-seq-footprinting

Detect transcription factor binding sites through footprinting analysis in ATAC-seq data using TOBIAS. Use when identifying TF occupancy patterns within accessible regions, as TF binding protects DNA from Tn5 cutting.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

bio-atac-seq-footprinting is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bio-atac-seq-footprinting should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-atac-seq-footprinting/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-atac-seq-footprinting/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-atac-seq-footprinting/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-atac-seq-footprinting Compares

Feature / Agent	bio-atac-seq-footprinting	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: bedtools 2.31+, matplotlib 3.8+, numpy 1.26+, pandas 2.2+, pyBigWig 0.3+, samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# TF Footprinting

**"Identify TF binding footprints in my ATAC-seq data"** → Detect protected DNA regions within accessible chromatin where bound transcription factors block Tn5 insertion.
- CLI: `TOBIAS ATACorrect` → `TOBIAS FootprintScores` → `TOBIAS BINDetect`

## TOBIAS Workflow

**Goal:** Identify transcription factor binding footprints within accessible chromatin regions.

**Approach:** Correct Tn5 insertion bias, compute per-base footprint scores, then detect bound/unbound TF motif sites using the three-step TOBIAS pipeline.

```bash
# 1. Correct Tn5 bias
tobias ATACorrect \
    --bam sample.bam \
    --genome genome.fa \
    --peaks peaks.bed \
    --outdir corrected/ \
    --cores 8

# 2. Calculate footprint scores
tobias FootprintScores \
    --signal corrected/sample_corrected.bw \
    --regions peaks.bed \
    --output footprints.bw \
    --cores 8

# 3. Bind TF motifs
tobias BINDetect \
    --motifs JASPAR_motifs.pfm \
    --signals footprints.bw \
    --genome genome.fa \
    --peaks peaks.bed \
    --outdir bindetect_output/ \
    --cores 8
```

## TOBIAS Differential Footprinting

**Goal:** Compare TF binding between two conditions to identify regulators with differential activity.

**Approach:** Provide two bias-corrected signal tracks to BINDetect, which scores each motif site for differential binding between conditions.

```bash
# Compare conditions
tobias BINDetect \
    --motifs JASPAR_motifs.pfm \
    --signals condition1.bw condition2.bw \
    --genome genome.fa \
    --peaks consensus_peaks.bed \
    --outdir differential_footprints/ \
    --cond_names condition1 condition2 \
    --cores 8

# Output includes:
# - Differential binding scores
# - Per-TF statistics
# - Bound/unbound site predictions
```

## Download JASPAR Motifs

```bash
# Download JASPAR motifs
wget https://jaspar.genereg.net/download/data/2022/CORE/JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt
mv JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt JASPAR_motifs.pfm
```

## Prepare Input Files

```bash
# Ensure BAM is sorted and indexed
samtools sort -@ 8 sample.bam -o sample.sorted.bam
samtools index sample.sorted.bam

# Filter peaks (remove blacklist, size filter)
bedtools intersect -v -a peaks.narrowPeak -b blacklist.bed | \
    awk '$3-$2 >= 100 && $3-$2 <= 5000' > filtered_peaks.bed
```

## HINT-ATAC Alternative

```bash
# RGT suite HINT-ATAC
rgt-hint footprinting \
    --atac-seq \
    --organism hg38 \
    --output-prefix sample \
    sample.bam peaks.bed
```

## PIQ Footprinting

```r
# PIQ (another footprinting tool)
library(PIQ)

# Load data
bam <- 'sample.bam'
pwms <- readMotifs('JASPAR_motifs.pfm')

# Run footprinting
piq_results <- piq(bam, pwms, genome='hg38')
```

## Aggregate Footprint Plots

```bash
# TOBIAS PlotAggregate
tobias PlotAggregate \
    --TFBS bindetect_output/*/beds/*_bound.bed \
    --signals corrected/sample_corrected.bw \
    --output aggregate_footprints.pdf \
    --share_y \
    --plot_boundaries
```

## Python: Custom Footprint Analysis

**Goal:** Extract and visualize aggregate ATAC-seq signal around predicted TF binding sites.

**Approach:** Sample bigWig signal values in windows centered on motif sites, average across all sites, and plot the characteristic V-shaped footprint.

```python
import pyBigWig
import numpy as np
import pandas as pd
from pyfaidx import Fasta

def extract_footprint_signal(bigwig_file, bed_file, flank=100):
    '''Extract signal around binding sites.'''
    bw = pyBigWig.open(bigwig_file)

    signals = []
    for line in open(bed_file):
        fields = line.strip().split('\t')
        chrom, start, end = fields[0], int(fields[1]), int(fields[2])
        center = (start + end) // 2

        try:
            vals = bw.values(chrom, center - flank, center + flank)
            if vals:
                signals.append(vals)
        except:
            continue

    avg_signal = np.nanmean(signals, axis=0)
    return avg_signal

def plot_footprint(signal, output_file):
    '''Plot aggregate footprint.'''
    import matplotlib.pyplot as plt

    x = np.arange(-len(signal)//2, len(signal)//2)

    plt.figure(figsize=(8, 4))
    plt.plot(x, signal, 'b-', linewidth=2)
    plt.axvline(0, color='red', linestyle='--', alpha=0.5)
    plt.xlabel('Distance from motif center (bp)')
    plt.ylabel('ATAC-seq signal')
    plt.title('Aggregate Footprint')
    plt.savefig(output_file, dpi=150)
    plt.close()
```

## Scan for Motifs

```bash
# Find motif occurrences in peaks
# Using FIMO (MEME suite)
fimo --oc fimo_output motifs.meme peaks.fa

# Or HOMER
findMotifsGenome.pl peaks.bed hg38 motif_analysis/ -find motif.motif
```

## Interpret Footprint Depth

| Footprint Depth | Interpretation |
|-----------------|----------------|
| Deep footprint | Strong TF binding |
| Shallow footprint | Weak/transient binding |
| No footprint | No binding or wrong motif |
| Shoulders only | Nucleosome positioning |

## Quality Considerations

```bash
# Footprinting requires:
# - High read depth (>50M reads)
# - NFR-enriched signal (filter for <100bp fragments)
# - Good Tn5 bias correction

# Extract NFR reads
samtools view -h sample.bam | \
    awk 'substr($0,1,1)=="@" || ($9>0 && $9<100) || ($9<0 && $9>-100)' | \
    samtools view -b > nfr.bam
```

## Differential TF Activity

```python
def compare_footprints(tf_name, cond1_bw, cond2_bw, motif_bed):
    '''Compare TF footprints between conditions.'''
    sig1 = extract_footprint_signal(cond1_bw, motif_bed)
    sig2 = extract_footprint_signal(cond2_bw, motif_bed)

    # Calculate footprint depth
    depth1 = np.nanmean(sig1[:30]) - np.nanmin(sig1[40:60])
    depth2 = np.nanmean(sig2[:30]) - np.nanmin(sig2[40:60])

    diff = depth2 - depth1

    return {
        'TF': tf_name,
        'depth_cond1': depth1,
        'depth_cond2': depth2,
        'difference': diff
    }
```

## TOBIAS Output Files

| File | Description |
|------|-------------|
| *_corrected.bw | Bias-corrected signal |
| *_footprints.bw | Footprint scores |
| *_bound.bed | Predicted bound sites |
| *_unbound.bed | Predicted unbound sites |
| *_overview.txt | Per-TF statistics |

## Related Skills

- atac-seq/atac-peak-calling - Generate peaks
- atac-seq/atac-qc - Verify data quality
- chip-seq/peak-annotation - Annotate binding sites
- sequence-manipulation/motif-search - Find motifs

Related Skills

datacommons-client

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Work with Data Commons, a platform providing programmatic access to public statistical data from global sources. Use this skill when working with demographic data, economic indicators, health statistics, environmental data, or any public datasets available through Data Commons. Applicable for querying population statistics, GDP figures, unemployment rates, disease prevalence, geographic entity resolution, and exploring relationships between statistical entities.

bio-single-cell-scatac-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Single-cell ATAC-seq analysis with Signac (R/Seurat) and ArchR. Process 10X Genomics scATAC data, perform QC, dimensionality reduction, clustering, peak calling, and motif activity scoring with chromVAR. Use when analyzing single-cell ATAC-seq data.

bio-atac-seq-nucleosome-positioning

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Extract nucleosome positions from ATAC-seq data using NucleoATAC, ATACseqQC, and fragment analysis. Use when analyzing chromatin organization, identifying nucleosome-free regions at promoters, or characterizing nucleosome occupancy patterns from ATAC-seq fragment size distributions.

bio-atac-seq-motif-deviation

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Analyze transcription factor motif accessibility variability using chromVAR. Use when identifying which TF motifs show variable accessibility across samples or conditions in ATAC-seq data.

bio-atac-seq-differential-accessibility

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Find differentially accessible chromatin regions between conditions using DiffBind or DESeq2. Use when comparing chromatin accessibility between treatment groups, cell types, or developmental stages in ATAC-seq experiments.

bio-atac-seq-atac-qc

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Quality control metrics for ATAC-seq data including fragment size distribution, TSS enrichment, FRiP, and library complexity. Use when assessing ATAC-seq library quality before or after peak calling to identify problematic samples.

bio-atac-seq-atac-peak-calling

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Call accessible chromatin regions from ATAC-seq data using MACS3 with ATAC-specific parameters. Use when identifying open chromatin regions from aligned ATAC-seq BAM files, different from ChIP-seq peak calling.

zinc-database

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

xlsx

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

writing-skills

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when creating new skills, editing existing skills, or verifying skills work before deployment

writing-plans

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when you have a spec or requirements for a multi-step task, before touching code