When should I trim my Illumina reads and how should I do it?

Should I trim adapters from my Illumina reads?
This depends on the objective of your experiments.
In case you are sequencing for counting applications like differential gene expression (DGE) RNA-seq analysis, ChIP-seq, ATAC-seq, read trimming is generally not required anymore when using modern aligners.  For such studies local aligners or pseudo-aligners should be used. Modern “local aligners” like STAR, BWA-MEM, HISAT2, will “soft-clip” non-matching sequences. Pseudo-aligners like Kallisto or Salmon will also not have any problem with reads containing adapter sequences.
However, if the data are used for variant analyses, genome annotation or genome or transcriptome assembly purposes, we recommend read trimming, including both, adapter and quality trimming.

How should I adapter trim my Illumina reads?
Paired-end-read sequencing data should be trimmed using algorithms that make use of the paired-end nature to enable the most precise trimming. This mode will not require any knowledge of the adapter sequences.
Recommended tools would be for example these tools in their dedicated paired-end modes:  BBduk, Skewer, HTStream, FASTP.  Among these, Skewer is likely the tool that is the easiest to use.
Trimming of single-end-read sequencing data requires knowledge of the adapter sequences (please see below). Recommended tools would be Scythe, Cutadapt, and Trimmomatic, HTStream, BBduk in their single-end modes.

DNA and RNA sequencing:
Truseq forward read: AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
Truseq reverse read: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

DNA sequencing:
Nextera: CTGTCTCTTATACACATCT

For small RNA/miRNA sequencing data please use this sequence bu also see this FAQ: How should the miRNA/smallRNA data be trimmed?.
TruSeq Small RNA: TGGAATTCTCGGGTGCCAAGG

Please see also this page from Illumina: What sequences do I use for adapter trimming? 
Quality trimming:
The counting applications the same considerations as for adapter trimming (above) apply for quality trimming. It can be omitted if using the right aligners.
For other applications, we recommend to combine gentle quality trimming with a threshold quality score of Q15 with a read length filter retaining only reads longer than 35 bp in length.
Quality trimming tools: e.g. Sickle, Trimmomatic, HTStream, BBduk.
References:
Williams et al. 2016. Trimming of sequence reads alters RNA-Seq gene expression estimates. BMC Bioinformatics. 2016;17:103. Published 2016 Feb 25. doi:10.1186/s12859-016-0956-2

Category: 06 Sequencing Data

← FAQ
Posted in