Where can I find the UMIs in the Tag-Seq data? When and how should I trim my Tag-Seq data? What is the low complexity stretch in the Tag-Seq data?

By default, we will generate Tag-Seq and Batch-Tag-Seq gene expression profiling data that incorporate Unique Molecular Identifiers (UMIs) in the sequence reads.
(This FAQ provides information on the usage of UMIs:  https://dnatech.genomecenter.ucdavis.edu/faqs/should-i-remove-pcr-duplicates-from-my-rna-seq-data/ ).
Please note that the UMIs provide optional additional data analysis options; for many applications, the UMI information can be safely ignored.  UMIs are especially beneficial for low RNA input situations as well as ultra-deep sequencing.
The UMI data are located in the first 6 bases of the read, followed by a constant 4bp spacer, followed by the 12 bp random priming sequence of the QuantSeq kit, which is finally followed by the insert sequence.
If you are using a local aligner (like STAR and most other RNA-seq aligners as well as BWA-MEM), you do not need to trim these sequences.
If using a global aligner, the first 22 bases should be trimmed from the read, after transferring the 6 bp UMI information into the read header. As with any other RNA-seq data, you should also trim any Poly-A stretches (expected at the 3’end) for global aligners.  It is recommended to transfer the UMI sequence information to the read header with UMI-TOOLS or custom scripts.
Tag-seq data are strand-specific and have a “sense-strand” orientation.
Please note that a FASTQC report of the Tag-Seq data will clearly show a low-complexity stretch of the data for bases 7  to 10 (sequence TATA). This is the spacer sequence between UMI and random priming sequences and is expected.

Category: 06 Sequencing Data

← FAQs
Posted in
Latest Tweets
  • About 50x to 80x enrichment of 18kb regions for targeted sequencing of native DNA with nCATS (Cas9-targeted nanopo… ,
  • A colleague is in need of an AFFORDABLE single-cell RNA-seq setup - which might get in contact with infectious samp… ,
  • This looks like a breakthrough protocol, thanks to it's simplicity. CRISPR-assisted targeted enrichment-sequencing… ,
  • PromethION sequencing of one person detects thousands of structural variants: "Sniffles after NGMLR alignment detec… ,