bio-read-qc-quality-filtering

Filter reads by quality scores, length, and N content using Trimmomatic and fastp. Apply sliding window trimming, remove low-quality bases from read ends, and discard reads below thresholds. Use when reads have poor quality tails or require minimum quality for downstream analysis.

1,802 stars

Best use case

bio-read-qc-quality-filtering is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Filter reads by quality scores, length, and N content using Trimmomatic and fastp. Apply sliding window trimming, remove low-quality bases from read ends, and discard reads below thresholds. Use when reads have poor quality tails or require minimum quality for downstream analysis.

Teams using bio-read-qc-quality-filtering should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-read-qc-quality-filtering/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-read-qc-quality-filtering/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/bio-read-qc-quality-filtering/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How bio-read-qc-quality-filtering Compares

Feature / Agentbio-read-qc-quality-filteringStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Filter reads by quality scores, length, and N content using Trimmomatic and fastp. Apply sliding window trimming, remove low-quality bases from read ends, and discard reads below thresholds. Use when reads have poor quality tails or require minimum quality for downstream analysis.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

## Version Compatibility

Reference examples tested with: Trimmomatic 0.39+, cutadapt 4.4+, fastp 0.23+

Before using code patterns, verify installed versions match. If versions differ:
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Quality Filtering

Trim low-quality bases and filter reads using Trimmomatic sliding window or fastp quality filtering.

**"Filter reads by quality"** → Remove low-quality bases and discard reads below quality/length thresholds.
- CLI: `trimmomatic PE` with SLIDINGWINDOW and MINLEN options
- CLI: `fastp --qualified_quality_phred 20 --length_required 50`

## Trimmomatic Quality Operations

### Single-End Mode

```bash
trimmomatic SE -phred33 \
    input.fastq.gz output.fastq.gz \
    LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
```

### Paired-End Mode

```bash
trimmomatic PE -phred33 -threads 4 \
    input_R1.fastq.gz input_R2.fastq.gz \
    output_R1_paired.fastq.gz output_R1_unpaired.fastq.gz \
    output_R2_paired.fastq.gz output_R2_unpaired.fastq.gz \
    LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
```

### Trimmomatic Operations

| Operation | Syntax | Description |
|-----------|--------|-------------|
| LEADING | LEADING:Q | Remove leading bases below quality Q |
| TRAILING | TRAILING:Q | Remove trailing bases below quality Q |
| SLIDINGWINDOW | SLIDINGWINDOW:W:Q | Cut when W-bp window average < Q |
| MINLEN | MINLEN:L | Discard reads shorter than L |
| CROP | CROP:L | Cut read to max length L |
| HEADCROP | HEADCROP:N | Remove first N bases |
| AVGQUAL | AVGQUAL:Q | Drop read if average quality < Q |
| MAXINFO | MAXINFO:L:S | Balance length and quality |
| TOPHRED33 | TOPHRED33 | Convert to Phred33 encoding |
| TOPHRED64 | TOPHRED64 | Convert to Phred64 encoding |

### Common Trimmomatic Recipes

```bash
# Standard quality trimming
trimmomatic SE input.fq output.fq \
    SLIDINGWINDOW:4:20 MINLEN:36

# Aggressive 3' trimming
trimmomatic SE input.fq output.fq \
    TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:36

# Trim both ends, strict filtering
trimmomatic SE input.fq output.fq \
    LEADING:10 TRAILING:10 SLIDINGWINDOW:4:25 MINLEN:50

# Keep fixed length (for some tools)
trimmomatic SE input.fq output.fq \
    CROP:100 MINLEN:100

# Remove first 10 bases (e.g., random primers)
trimmomatic SE input.fq output.fq \
    HEADCROP:10 MINLEN:36
```

### SLIDINGWINDOW Details

```bash
SLIDINGWINDOW:<windowSize>:<requiredQuality>

# Scan from 5' to 3'
# Cut when average quality in window drops below threshold
# Common settings: 4:15, 4:20, 5:20

# Conservative (keep more, lower quality)
SLIDINGWINDOW:4:15

# Moderate
SLIDINGWINDOW:4:20

# Strict (keep less, higher quality)
SLIDINGWINDOW:4:25
```

## fastp Quality Filtering

### Basic Quality Filtering

```bash
# Quality filtering (default Q15)
fastp -i in.fq -o out.fq

# Custom quality threshold
fastp -i in.fq -o out.fq -q 20

# Sliding window from 5' end
fastp -i in.fq -o out.fq --cut_front --cut_front_window_size 4 --cut_front_mean_quality 20

# Sliding window from 3' end
fastp -i in.fq -o out.fq --cut_tail --cut_tail_window_size 4 --cut_tail_mean_quality 20

# Aggressive right-side trimming (recommended)
fastp -i in.fq -o out.fq --cut_right --cut_right_window_size 4 --cut_right_mean_quality 20
```

### fastp Quality Options

```bash
# Global mean quality filter
fastp -i in.fq -o out.fq -q 20 -e 25
# -q: per-base quality threshold
# -e: average quality threshold for entire read

# Unqualified bases threshold
fastp -i in.fq -o out.fq --unqualified_percent_limit 40
# Discard if >40% bases below quality threshold

# N base filtering
fastp -i in.fq -o out.fq -n 5
# Discard reads with >5 N bases
```

### Paired-End with fastp

```bash
fastp -i R1.fq -I R2.fq -o out_R1.fq -O out_R2.fq \
    --cut_right \
    --cut_right_window_size 4 \
    --cut_right_mean_quality 20 \
    -q 20 -l 36
```

### Length Filtering

```bash
# Trimmomatic
trimmomatic SE input.fq output.fq MINLEN:50

# fastp
fastp -i in.fq -o out.fq -l 50          # min length
fastp -i in.fq -o out.fq --length_limit 150  # max length
```

## Cutadapt Quality Trimming

```bash
# Trim 3' end below Q20
cutadapt -q 20 -o out.fq in.fq

# Trim both ends
cutadapt -q 20,20 -o out.fq in.fq

# With minimum length
cutadapt -q 20 -m 36 -o out.fq in.fq

# Paired-end
cutadapt -q 20 -m 36 -o R1.fq -p R2.fq in_R1.fq in_R2.fq
```

## Combined Adapter + Quality Trimming

### Trimmomatic Full Pipeline

```bash
trimmomatic PE -threads 4 -phred33 \
    R1.fq.gz R2.fq.gz \
    R1_paired.fq.gz R1_unpaired.fq.gz \
    R2_paired.fq.gz R2_unpaired.fq.gz \
    ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10:2:keepBothReads \
    LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36
```

### Cutadapt Full Pipeline

```bash
cutadapt \
    -a AGATCGGAAGAGC -A AGATCGGAAGAGC \
    -q 20 -m 36 \
    -o R1_trimmed.fq.gz -p R2_trimmed.fq.gz \
    R1.fq.gz R2.fq.gz
```

## Poly-G Trimming (NovaSeq/NextSeq)

NextSeq and NovaSeq use two-color chemistry, causing poly-G artifacts at read ends.

```bash
# fastp auto-detects and trims poly-G
fastp -i in.fq -o out.fq --trim_poly_g

# Disable auto-detection
fastp -i in.fq -o out.fq --disable_trim_poly_g

# Trimmomatic (manual approach)
# Add poly-G to adapter file
```

## Quality Thresholds

| Phred | Error Rate | Use Case |
|-------|------------|----------|
| Q10 | 10% | Very lenient |
| Q15 | 3% | fastp default |
| Q20 | 1% | Common threshold |
| Q25 | 0.3% | Strict |
| Q30 | 0.1% | Very strict |

## Related Skills

- adapter-trimming - Remove adapters before quality filtering
- quality-reports - Check quality before/after filtering
- fastp-workflow - All-in-one preprocessing

Related Skills

bio-variant-calling-filtering-best-practices

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive variant filtering including GATK VQSR, hard filters, bcftools expressions, and quality metric interpretation for SNPs and indels. Use when filtering variants using GATK best practices.

bio-read-sequences

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Read biological sequence files (FASTA, FASTQ, GenBank, EMBL, ABI, SFF) using Biopython Bio.SeqIO. Use when parsing sequence files, iterating multi-sequence files, random access to large files, or high-performance parsing.

bio-read-qc-umi-processing

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Extract, process, and deduplicate reads using Unique Molecular Identifiers (UMIs) with umi_tools. Use when library prep includes UMIs and accurate molecule counting is needed, such as in single-cell RNA-seq, low-input RNA-seq, or targeted sequencing to distinguish PCR from biological duplicates.

bio-read-qc-quality-reports

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Generate and interpret quality reports from FASTQ files using FastQC and MultiQC. Assess per-base quality, adapter content, GC bias, duplication levels, and overrepresented sequences. Use when performing initial QC on raw sequencing data or validating preprocessing results.

bio-read-qc-fastp-workflow

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

All-in-one read preprocessing with fastp including adapter trimming, quality filtering, deduplication, base correction, and HTML report generation. Use when preprocessing Illumina data and wanting a single fast tool instead of separate Cutadapt, Trimmomatic, and FastQC steps.

bio-read-qc-contamination-screening

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Detect sample contamination and cross-species reads using FastQ Screen. Screen reads against multiple reference genomes to identify bacterial, viral, adapter, or sample swap contamination. Use when suspecting cross-contamination or working with samples prone to microbial contamination.

bio-read-qc-adapter-trimming

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Remove sequencing adapters from FASTQ files using Cutadapt and Trimmomatic. Supports single-end and paired-end reads, Illumina TruSeq, Nextera, and custom adapter sequences. Use when FastQC shows adapter contamination or before alignment of short reads.

bio-longread-structural-variants

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Detect structural variants from long-read alignments using Sniffles, cuteSV, and SVIM. Use when detecting deletions, insertions, inversions, translocations, or complex rearrangements from ONT or PacBio data, especially those missed by short-read methods.

bio-longread-qc

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Quality control for long-read sequencing data using NanoPlot, NanoStat, and chopper. Generate QC reports, filter reads by length and quality, and visualize read characteristics. Use when assessing ONT or PacBio run quality or filtering reads before assembly or alignment.

bio-longread-medaka

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Polish assemblies and call variants from Oxford Nanopore data using medaka. Uses neural networks trained on specific basecaller versions. Use when improving ONT-only assemblies or calling variants from Nanopore data without short-read polishing.

bio-longread-alignment

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Align long reads using minimap2 for Oxford Nanopore and PacBio data. Supports various presets for different read types and applications. Use when aligning ONT or PacBio reads to a reference genome for variant calling, SV detection, or coverage analysis.

bio-long-read-sequencing-nanopore-methylation

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Calls DNA methylation from Oxford Nanopore sequencing data using signal-level analysis. Use when detecting 5mC or 6mA modifications directly from nanopore reads without bisulfite conversion.