bio-read-qc-quality-reports

Generate and interpret quality reports from FASTQ files using FastQC and MultiQC. Assess per-base quality, adapter content, GC bias, duplication levels, and overrepresented sequences. Use when performing initial QC on raw sequencing data or validating preprocessing results.

1,802 stars

Best use case

bio-read-qc-quality-reports is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Generate and interpret quality reports from FASTQ files using FastQC and MultiQC. Assess per-base quality, adapter content, GC bias, duplication levels, and overrepresented sequences. Use when performing initial QC on raw sequencing data or validating preprocessing results.

Teams using bio-read-qc-quality-reports should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-read-qc-quality-reports/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-read-qc-quality-reports/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/bio-read-qc-quality-reports/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How bio-read-qc-quality-reports Compares

Feature / Agentbio-read-qc-quality-reportsStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Generate and interpret quality reports from FASTQ files using FastQC and MultiQC. Assess per-base quality, adapter content, GC bias, duplication levels, and overrepresented sequences. Use when performing initial QC on raw sequencing data or validating preprocessing results.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

## Version Compatibility

Reference examples tested with: pandas 2.2+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Quality Reports

Generate quality reports for FASTQ files using FastQC and aggregate multiple reports with MultiQC.

**"Run quality control on FASTQ files"** → Generate per-base quality, adapter content, and duplication plots, then aggregate across samples.
- CLI: `fastqc *.fastq.gz` then `multiqc .`

## FastQC - Single Sample Reports

### Basic Usage

```bash
# Single file
fastqc sample.fastq.gz

# Multiple files
fastqc *.fastq.gz

# Specify output directory
fastqc -o qc_reports/ sample_R1.fastq.gz sample_R2.fastq.gz

# Set threads
fastqc -t 4 *.fastq.gz
```

### Output Files

FastQC produces two files per input:
- `sample_fastqc.html` - Interactive HTML report
- `sample_fastqc.zip` - Data files and images

### Key Modules

| Module | What It Shows | Warning Signs |
|--------|---------------|---------------|
| Per base sequence quality | Quality scores across read | Drop below Q20 at 3' end |
| Per sequence quality | Quality score distribution | Bimodal distribution |
| Per base sequence content | Nucleotide composition | Imbalance at start (normal) |
| Per sequence GC content | GC distribution | Secondary peak (contamination) |
| Per base N content | Unknown bases | High N content |
| Sequence length distribution | Read lengths | Unexpected variation |
| Sequence duplication | Duplicate reads | High duplication (PCR) |
| Overrepresented sequences | Common sequences | Adapter contamination |
| Adapter content | Adapter sequences | Visible adapter curves |

### Extract Data from ZIP

```bash
# Unzip to access raw data
unzip sample_fastqc.zip

# View summary
cat sample_fastqc/summary.txt

# Get per-base quality
cat sample_fastqc/fastqc_data.txt | grep -A 50 ">>Per base sequence quality"
```

## MultiQC - Aggregate Reports

### Basic Usage

```bash
# Aggregate all FastQC reports in current directory
multiqc .

# Specify input and output
multiqc qc_reports/ -o multiqc_output/

# Custom report name
multiqc . -n my_project_qc

# Force overwrite
multiqc . -f
```

### Common Options

```bash
# Flat directory (no sample subdirs)
multiqc --flat .

# Export data as TSV
multiqc . --export

# Only specific modules
multiqc . -m fastqc

# Exclude patterns
multiqc . --ignore '*_trimmed*'

# Include patterns
multiqc . --ignore-samples '*negative*'
```

### Output Files

- `multiqc_report.html` - Interactive HTML report
- `multiqc_data/` - Directory with data tables
  - `multiqc_fastqc.txt` - FastQC metrics
  - `multiqc_general_stats.txt` - Summary statistics
  - `multiqc_sources.txt` - Source files used

### Extract Data Programmatically

```python
import pandas as pd

general_stats = pd.read_csv('multiqc_data/multiqc_general_stats.txt', sep='\t')
print(general_stats.columns)

fastqc_data = pd.read_csv('multiqc_data/multiqc_fastqc.txt', sep='\t')
```

## Batch Processing

### Process Multiple Samples

```bash
# All FASTQ files in parallel
fastqc -t 8 -o qc_reports/ raw_data/*.fastq.gz

# Then aggregate
multiqc qc_reports/ -o multiqc_output/
```

### Before and After Trimming

```bash
# Create separate directories
mkdir -p qc_reports/raw qc_reports/trimmed

# QC raw reads
fastqc -o qc_reports/raw/ raw_data/*.fastq.gz

# After trimming (using fastp, cutadapt, etc.)
fastqc -o qc_reports/trimmed/ trimmed_data/*.fastq.gz

# Compare with MultiQC
multiqc qc_reports/ -o qc_comparison/
```

## Interpretation Guide

### Quality Scores

| Phred Score | Error Rate | Interpretation |
|-------------|------------|----------------|
| Q40 | 0.0001 | Excellent |
| Q30 | 0.001 | Good (Illumina target) |
| Q20 | 0.01 | Acceptable |
| Q10 | 0.1 | Poor |

### Common Issues

| Issue | Likely Cause | Action |
|-------|--------------|--------|
| Low quality at 3' end | Normal degradation | Trim 3' end |
| Adapter contamination | Short inserts | Trim adapters |
| GC bias | Library prep | Consider correction |
| High duplication | Low complexity, PCR | Mark/remove duplicates |
| Overrepresented seqs | Adapters, primers | Check sequences |

## Configuration

### Custom Adapters

Create `~/.fastqc/Configuration/adapter_list.txt`:
```
Custom_Adapter_Name    ACGTACGTACGT
```

### Custom Limits

Create `~/.fastqc/Configuration/limits.txt` to customize thresholds:
```
# Warn if mean quality below 25
quality_sequence    warn    25
quality_sequence    error   20
```

## Related Skills

- adapter-trimming - Remove adapters detected by FastQC
- fastp-workflow - All-in-one QC and trimming
- sequence-io/read-sequences - FASTQ file reading/writing

Related Skills

clinical-reports

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Write comprehensive clinical reports including case reports (CARE guidelines), diagnostic reports (radiology/pathology/lab), clinical trial reports (ICH-E3, SAE, CSR), and patient documentation (SOAP, H&P, discharge summaries). Full support with templates, regulatory compliance (HIPAA, FDA, ICH-GCP), and validation tools.

bio-read-sequences

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Read biological sequence files (FASTA, FASTQ, GenBank, EMBL, ABI, SFF) using Biopython Bio.SeqIO. Use when parsing sequence files, iterating multi-sequence files, random access to large files, or high-performance parsing.

bio-read-qc-umi-processing

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Extract, process, and deduplicate reads using Unique Molecular Identifiers (UMIs) with umi_tools. Use when library prep includes UMIs and accurate molecule counting is needed, such as in single-cell RNA-seq, low-input RNA-seq, or targeted sequencing to distinguish PCR from biological duplicates.

bio-read-qc-quality-filtering

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Filter reads by quality scores, length, and N content using Trimmomatic and fastp. Apply sliding window trimming, remove low-quality bases from read ends, and discard reads below thresholds. Use when reads have poor quality tails or require minimum quality for downstream analysis.

bio-read-qc-fastp-workflow

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

All-in-one read preprocessing with fastp including adapter trimming, quality filtering, deduplication, base correction, and HTML report generation. Use when preprocessing Illumina data and wanting a single fast tool instead of separate Cutadapt, Trimmomatic, and FastQC steps.

bio-read-qc-contamination-screening

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Detect sample contamination and cross-species reads using FastQ Screen. Screen reads against multiple reference genomes to identify bacterial, viral, adapter, or sample swap contamination. Use when suspecting cross-contamination or working with samples prone to microbial contamination.

bio-read-qc-adapter-trimming

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Remove sequencing adapters from FASTQ files using Cutadapt and Trimmomatic. Supports single-end and paired-end reads, Illumina TruSeq, Nextera, and custom adapter sequences. Use when FastQC shows adapter contamination or before alignment of short reads.

bio-longread-structural-variants

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Detect structural variants from long-read alignments using Sniffles, cuteSV, and SVIM. Use when detecting deletions, insertions, inversions, translocations, or complex rearrangements from ONT or PacBio data, especially those missed by short-read methods.

bio-longread-qc

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Quality control for long-read sequencing data using NanoPlot, NanoStat, and chopper. Generate QC reports, filter reads by length and quality, and visualize read characteristics. Use when assessing ONT or PacBio run quality or filtering reads before assembly or alignment.

bio-longread-medaka

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Polish assemblies and call variants from Oxford Nanopore data using medaka. Uses neural networks trained on specific basecaller versions. Use when improving ONT-only assemblies or calling variants from Nanopore data without short-read polishing.

bio-longread-alignment

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Align long reads using minimap2 for Oxford Nanopore and PacBio data. Supports various presets for different read types and applications. Use when aligning ONT or PacBio reads to a reference genome for variant calling, SV detection, or coverage analysis.

bio-long-read-sequencing-nanopore-methylation

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Calls DNA methylation from Oxford Nanopore sequencing data using signal-level analysis. Use when detecting 5mC or 6mA modifications directly from nanopore reads without bisulfite conversion.