bio-read-qc-adapter-trimming
Remove sequencing adapters from FASTQ files using Cutadapt and Trimmomatic. Supports single-end and paired-end reads, Illumina TruSeq, Nextera, and custom adapter sequences. Use when FastQC shows adapter contamination or before alignment of short reads.
Best use case
bio-read-qc-adapter-trimming is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Remove sequencing adapters from FASTQ files using Cutadapt and Trimmomatic. Supports single-end and paired-end reads, Illumina TruSeq, Nextera, and custom adapter sequences. Use when FastQC shows adapter contamination or before alignment of short reads.
Teams using bio-read-qc-adapter-trimming should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/bio-read-qc-adapter-trimming/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How bio-read-qc-adapter-trimming Compares
| Feature / Agent | bio-read-qc-adapter-trimming | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Remove sequencing adapters from FASTQ files using Cutadapt and Trimmomatic. Supports single-end and paired-end reads, Illumina TruSeq, Nextera, and custom adapter sequences. Use when FastQC shows adapter contamination or before alignment of short reads.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
## Version Compatibility
Reference examples tested with: FastQC 0.12+, Trimmomatic 0.39+, cutadapt 4.4+, fastp 0.23+
Before using code patterns, verify installed versions match. If versions differ:
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Adapter Trimming
Remove sequencing adapters from reads using Cutadapt (precise, flexible) or Trimmomatic (paired-end optimized).
**"Trim adapters from reads"** → Remove sequencing adapter sequences from FASTQ reads to prevent adapter contamination in downstream alignment.
- CLI: `cutadapt -a ADAPTER -o out.fq in.fq` or `trimmomatic PE` with ILLUMINACLIP
- CLI: `fastp -i in.fq -o out.fq` (auto-detects adapters)
## Common Adapter Sequences
| Platform/Kit | Adapter | Sequence |
|--------------|---------|----------|
| Illumina TruSeq | Read 1 3' | AGATCGGAAGAGCACACGTCTGAACTCCAGTCA |
| Illumina TruSeq | Read 2 3' | AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT |
| Nextera | Transposase | CTGTCTCTTATACACATCT |
| Small RNA | 3' adapter | TGGAATTCTCGGGTGCCAAGG |
| Poly-A | Poly-A tail | AAAAAAAAAAAAAAAA |
## Cutadapt
### Single-End Reads
```bash
# 3' adapter (most common)
cutadapt -a AGATCGGAAGAGC -o trimmed.fastq.gz sample.fastq.gz
# 5' adapter
cutadapt -g ACGTACGT -o trimmed.fastq.gz sample.fastq.gz
# Both ends
cutadapt -a ADAPTER1 -g ADAPTER2 -o trimmed.fastq.gz sample.fastq.gz
# Multiple adapters (tries each)
cutadapt -a ADAPTER1 -a ADAPTER2 -a ADAPTER3 -o trimmed.fastq.gz sample.fastq.gz
```
### Paired-End Reads
```bash
# Basic paired-end
cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
-A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
-o trimmed_R1.fastq.gz -p trimmed_R2.fastq.gz \
sample_R1.fastq.gz sample_R2.fastq.gz
# Short form for Illumina TruSeq (auto-detect)
cutadapt -a AGATCGGAAGAGC -A AGATCGGAAGAGC \
-o trimmed_R1.fastq.gz -p trimmed_R2.fastq.gz \
sample_R1.fastq.gz sample_R2.fastq.gz
```
### Adapter Options
```bash
# Error rate (default 0.1 = 10% mismatches allowed)
cutadapt -a ADAPTER -e 0.15 -o out.fq in.fq
# Minimum overlap (default 3)
cutadapt -a ADAPTER -O 5 -o out.fq in.fq
# No indels in adapter alignment
cutadapt -a ADAPTER --no-indels -o out.fq in.fq
# Trim Ns from ends
cutadapt --trim-n -o out.fq in.fq
# Anchored adapters (must be at end)
cutadapt -a ADAPTER$ -o out.fq in.fq
```
### Linked Adapters
```bash
# 5' adapter followed by 3' adapter (same read)
cutadapt -a ADAPTER1...ADAPTER2 -o out.fq in.fq
# Anchored 5' linked to 3'
cutadapt -a ^ADAPTER1...ADAPTER2 -o out.fq in.fq
```
### Filtering After Trimming
```bash
# Minimum length (discard shorter)
cutadapt -a ADAPTER -m 20 -o out.fq in.fq
# Maximum length
cutadapt -a ADAPTER -M 150 -o out.fq in.fq
# Maximum N content
cutadapt -a ADAPTER --max-n 0.1 -o out.fq in.fq
# Discard trimmed reads
cutadapt -a ADAPTER --discard-trimmed -o out.fq in.fq
# Discard untrimmed reads
cutadapt -a ADAPTER --discard-untrimmed -o out.fq in.fq
```
### Paired-End Filtering
```bash
# Both reads must pass minimum length
cutadapt -a ADAPT1 -A ADAPT2 -m 20 \
-o R1.fq -p R2.fq in_R1.fq in_R2.fq
# Output too-short reads separately
cutadapt -a ADAPT1 -A ADAPT2 -m 20 \
--too-short-output short_R1.fq --too-short-paired-output short_R2.fq \
-o R1.fq -p R2.fq in_R1.fq in_R2.fq
```
### Action Options
```bash
# Mask adapter instead of trim (replace with N)
cutadapt -a ADAPTER --action=mask -o out.fq in.fq
# Retain adapter but lowercase
cutadapt -a ADAPTER --action=lowercase -o out.fq in.fq
# Just find adapters, don't modify
cutadapt -a ADAPTER --action=none -o out.fq in.fq
```
## Trimmomatic
### Single-End Mode
```bash
trimmomatic SE -phred33 \
input.fastq.gz output.fastq.gz \
ILLUMINACLIP:adapters.fa:2:30:10
```
### Paired-End Mode
```bash
trimmomatic PE -phred33 -threads 4 \
input_R1.fastq.gz input_R2.fastq.gz \
output_R1_paired.fastq.gz output_R1_unpaired.fastq.gz \
output_R2_paired.fastq.gz output_R2_unpaired.fastq.gz \
ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10
```
### ILLUMINACLIP Parameters
```bash
ILLUMINACLIP:<fastaWithAdapters>:<seed>:<palindrome>:<simple>
# Parameters:
# seed - max mismatches in 16bp seed (usually 2)
# palindrome - threshold for palindrome match (usually 30)
# simple - threshold for simple match (usually 10)
# Example with all options
ILLUMINACLIP:adapters.fa:2:30:10:2:keepBothReads
```
### Built-in Adapter Files
Trimmomatic includes adapter files:
- `TruSeq2-SE.fa` - TruSeq v2 single-end
- `TruSeq2-PE.fa` - TruSeq v2 paired-end
- `TruSeq3-SE.fa` - TruSeq v3 single-end
- `TruSeq3-PE.fa` - TruSeq v3 paired-end
- `TruSeq3-PE-2.fa` - TruSeq v3 PE (palindrome mode)
- `NexteraPE-PE.fa` - Nextera paired-end
### Find Trimmomatic Adapters
```bash
# Find adapter directory
TRIMMOMATIC_JAR=$(which trimmomatic | xargs dirname)/../share/trimmomatic-*/adapters/
# Or with conda
ls $CONDA_PREFIX/share/trimmomatic-*/adapters/
```
## Performance
```bash
# Cutadapt with multiple cores
cutadapt -j 8 -a ADAPTER -o out.fq in.fq
# Trimmomatic threads
trimmomatic PE -threads 8 ...
```
## Verify Trimming
```bash
# Check adapter removal with FastQC
fastqc trimmed.fastq.gz
# Count reads before/after
zcat input.fastq.gz | wc -l
zcat trimmed.fastq.gz | wc -l
```
## Related Skills
- quality-reports - Check adapter content with FastQC
- quality-filtering - Quality trimming after adapter removal
- fastp-workflow - Combined adapter and quality trimmingRelated Skills
bio-read-sequences
Read biological sequence files (FASTA, FASTQ, GenBank, EMBL, ABI, SFF) using Biopython Bio.SeqIO. Use when parsing sequence files, iterating multi-sequence files, random access to large files, or high-performance parsing.
bio-read-qc-umi-processing
Extract, process, and deduplicate reads using Unique Molecular Identifiers (UMIs) with umi_tools. Use when library prep includes UMIs and accurate molecule counting is needed, such as in single-cell RNA-seq, low-input RNA-seq, or targeted sequencing to distinguish PCR from biological duplicates.
bio-read-qc-quality-reports
Generate and interpret quality reports from FASTQ files using FastQC and MultiQC. Assess per-base quality, adapter content, GC bias, duplication levels, and overrepresented sequences. Use when performing initial QC on raw sequencing data or validating preprocessing results.
bio-read-qc-quality-filtering
Filter reads by quality scores, length, and N content using Trimmomatic and fastp. Apply sliding window trimming, remove low-quality bases from read ends, and discard reads below thresholds. Use when reads have poor quality tails or require minimum quality for downstream analysis.
bio-read-qc-fastp-workflow
All-in-one read preprocessing with fastp including adapter trimming, quality filtering, deduplication, base correction, and HTML report generation. Use when preprocessing Illumina data and wanting a single fast tool instead of separate Cutadapt, Trimmomatic, and FastQC steps.
bio-read-qc-contamination-screening
Detect sample contamination and cross-species reads using FastQ Screen. Screen reads against multiple reference genomes to identify bacterial, viral, adapter, or sample swap contamination. Use when suspecting cross-contamination or working with samples prone to microbial contamination.
bio-longread-structural-variants
Detect structural variants from long-read alignments using Sniffles, cuteSV, and SVIM. Use when detecting deletions, insertions, inversions, translocations, or complex rearrangements from ONT or PacBio data, especially those missed by short-read methods.
bio-longread-qc
Quality control for long-read sequencing data using NanoPlot, NanoStat, and chopper. Generate QC reports, filter reads by length and quality, and visualize read characteristics. Use when assessing ONT or PacBio run quality or filtering reads before assembly or alignment.
bio-longread-medaka
Polish assemblies and call variants from Oxford Nanopore data using medaka. Uses neural networks trained on specific basecaller versions. Use when improving ONT-only assemblies or calling variants from Nanopore data without short-read polishing.
bio-longread-alignment
Align long reads using minimap2 for Oxford Nanopore and PacBio data. Supports various presets for different read types and applications. Use when aligning ONT or PacBio reads to a reference genome for variant calling, SV detection, or coverage analysis.
bio-long-read-sequencing-nanopore-methylation
Calls DNA methylation from Oxford Nanopore sequencing data using signal-level analysis. Use when detecting 5mC or 6mA modifications directly from nanopore reads without bisulfite conversion.
bio-long-read-sequencing-isoseq-analysis
Analyze PacBio Iso-Seq data for full-length isoform discovery and quantification. Use when characterizing transcript diversity or identifying novel splice variants.