Tag-Seq Gene Expression Profiling

3’Tag-Seq is a protocol to generate low-cost and exceptionally low-noise gene expression profiling data.   The protocol is also known as  TagSeq, 3’Tag-RNA-Seq, Digital RNA-seq, Quant-Seq, etc.  (please note that most of these names have also been used for a variety of other protocols previously). In contrast to traditional RNA-Seq, which generates sequencing libraries for the whole transcripts, 3-Tag-Seq only generates a single initial library molecule per transcript, complementary to 3′-end sequences. For example for human samples, the restriction to a small part of the transcripts reduces the number of sequencing reads required by at least five times.  In most cases ore than 48 samples can be sequenced per HiSeq 4000 lane.   The vast majority of the RNA-seq studies are analyzed exclusively for differential gene expression (DGE). The conventional full transcript RNA-seq protocols generate more data than needed for this specific purpose.  The complexity of the standard RNA-seq data is not an advantage if the aim of the project is only DGE analysis – 3’Tag-Seq might actually be the superior tool for this application (DGE). In our experience the 3’Tag-Seq data have so far shown exceptionally low noise, as well as insensitivity to RNA sample quality variations.​

  • We are offering 3’Tag-Seq as full-service-packages (Batch-Tag-Seq, please see  below) including library preps, sequencing and optionally also DGE data analysis.  These packages are priced at easy to budget per-sample recharge rates.
  • For  demanding sample types that protocol modifications we are offering custom Tag-Seq sequencing please inquire.

Advantages of 3’Tag-Seq:

  • low noise gene expression profiling
  • less sensitive to RNA sample quality/integrity variations (compared to poly-A enrichment protocols)
  • requires significantly lower numbers of sequencing reads
  • single read sequencing is sufficient
  • simpler library prep protocol
  • costs about half or less compared to standard RNA-seq
  • costs lower than, or comparable to, microarray analysis
  • much higher dynamic range compared to microarrays
  • we routinely sequence 48 libraries per HiSeq lane; for some applications 96 libraries per lane are sufficient
  • for very low input or high depth sequencing of 3’Tag-Seq libraries UMI‘s (unique modular identifiers) can be incorporated
  • soon: simple pricing scheme and simplified planning of experiments

Disdavantages of 3’Tag-Seq:

  • data do not contain any transcript-splicing information
  • data analysis requires a reference genome with good annotation (also of the UTRs)
  • only applicable to eukaryotic samples
  • protocol is a (a bit) more sensitive to chemical contaminants (spin column cleaned RNA samples are recommended)

For high-throughput 3’Tag-Seq library generation we 15 ul of each pure total RNA sample at a concentration of 50 to 100 ng/ul).  For custom 3’-Tag-Seq library preps the input amounts can be a low as 10 ng total. The RNA samples for this protocol need to be isolated or cleaned-up by spin-column protocols. Please also see the sample requirements page.   3′-Tag-Seq libraries are sequenced by single-end sequencing on the HiSeq 4000 or the NextSeq.  Please note that 3’Tag-Seq libraries generate lower read numbers on the HiSeq 4000 (about 320 million reads per lane) compared to standard RNA-seq libraries.  Since the DGE analysis of Tag-Seq data requires much lower read numbers this is usually not a problem.  The libraries will be sequenced on Illumina HiSeq 4000 or NextSeq 500 sequencers with single-end 80 or 90 bp reads (SE80 or SE90).  Please note that for some analysis pipelines it is recommended to trim off the first 12 bases from the reads.  We will provide the full length data.  Trimming is not necessary if you are using a local aligner (like STAR or BBmap). The sequences can be trimmed easily, for example with the “reformat” command from BBTools.   Please also see the 3’Tag-Seq data analysis recommendations and this note on working with degraded RNA samples.
In case UMIs are incorporated, the first 6 bases of the forward read represent the UMI, followed by a common linker with the sequence “TATA”, followed by the 12bp random priming sequence.    It is recommended to transfer the UMI sequence information to the read header and trim the first 22 bases from each read with UMI-TOOLS or custom scripts. The same software can be used to remove PCR-duplicates after the alignments.

This example MDS plot shows an analysis of 3’Tag-Seq data of macrophage cells exposed to three types of bacterial infections and mock-infections at two time points. The analysis distinguishes the responses to the individual bacterial species and the duration of the infections. Even the reactions to the mock-infections are clustered by time points.

MDSplot

Batch-Tag-Seq  –  Gene Expression Profiling Simplified  –  With and Without Data Analysis

3’-Tag-Seq is a protocol to generate exceptionally low-noise and low-cost gene expression profiling data.  For more details on the technology please see our FAQs on 3′-Tag-RNA-Seq.

The Batch-Tag-Seq packages that include 3′-Tag-RNA-Seq library preparations & sequencing & optional data analysis at low per-sample rates.  The differential gene expression (DGE) data analysis is performed by the Bioinformatics Core.  For Batch-Tag-Seq we will collect samples and process them in larger batches and sequence the barcoded libraries together on sequencing runs, allowing us to offer affordable recharge rates on a per-sample basis.  This also simplifies the budgeting and planning of experiments since scientists will not have to adjust their experiments to the sequencing capacity of the sequencers.

Please note: In order to enable low cost DGE data analyses and short turnaround times, we will process the Batch-Tag-Seq samples in high throughput fashion.  We will not spend time customizing the protocol for individual samples (for example we will not run sample cleanups, sample concentrations, or repeat the library preps or PCR amplification with varying cycle numbers).  The 3′-Tag-RNA-Seq library prep protocol is very robust, therefore no problems should be expected as long as the RNA samples fulfill the sample requirements. Please note that the customer is responsible for the sample quality.

Optional Bioinformatics:  DGE Analysis
A prerequisite for bioinformatic data analysis is the availability of a well annotated reference genome (including UTRs; to be provided by the customer).  The deliverables will include data QC, read-counts-per-gene tables, and DGE analysis for simple comparisons.  At least 3 biological replicate samples need to be sequenced for each condition.  Depending on the nature of the samples and the phenotypes, meaningful experiments might require higher replicate numbers.  Please inquire with the Bioinformatics Core staff (bioinformatics.core@ucdavis.edu) for details.    The DGE analysis service (if selected) will include:

  • QC and pre-processing of data
  • Mapping to genome reference
  • Analysis in Bioconductor/R: normalized read counts
  • Include QA/QC metrics from Preprocessing and Mapping
  • Single-factor DGE analysis
  • MDS plot and any other QA/QC plots from DE analysis
  • Differential Expression Table
  • Full description of processing method

Further considerations:
 3′-Tag-RNA-Seq is only suitable for eukaryotic total RNA-samples (A-tailed transcripts).
– Each Batch-Tag-Seq project requires a minimum of 12 samples.
 For each condition at least 3 biological replicate samples need to be sequenced to allow for a meaningful DGE analysis. As with any other DGE study, statistically meaningful experiments might require higher replicate numbers, depending on the nature of the samples and the phenotypes. Both, the DNA Tech Core and the Bioinformatics Core (bioinformatics.core@ucdavis.edu), offer free consultations to discuss such details before starting projects.
– The RNA samples need to be dissolved in molecular biology grade H2O or EB buffer.  As always RNA-seq samples need to be DNA-free.
 The sample concentration must be determined by fluorometry (e.g. Qubit; plate-reader with Ribo-Green), as spectrometry quantifications (e.g. Nanodrop) are very unreliable.
 To assure the chemical purity of the samples the absorbance ratios should be between 1.8 and 2.1 (260/280 nm ratio) and above 1.5 (260/230 nm ratio).
 If RNA-isolation protocols involving TriZol are used, the RNA should then be purified via a spin column kits (e.g. Zymo RNA clean & concentrate) to remove any solvent traces.
 3′-Tag-Seq has a relatively high tolerance for RNA integrity variation. Nevertheless, we do not recommend using RNA samples with a RIN-score lower than 6 for batch processing. We cannot guarantee for the outcome of the library prep and the data for lower quality samples or samples without Bioanalyzer QC.
 The customers should provide Bioanalyzer traces (or equivalent).  Alternatively we can also run the RNA sample QC on the Bioanalyzer or the LabChip GX for an additional fee.
 You will receive around 3 to 6 million reads per sample.  For typical experiments about 2 million reads per sample are required for the DGE analysis of highly- and medium-expressed genes.
 The availability of a well annotated reference genome (including UTRs; to be provided by the customer) is a prerequisite for bioinformatic data analysis.

Custom 3’-Tag RNA-Seq:
In contrast to the batch processing described above we can adjust the library prep parameters for 3’-Tag-Seq when running custom library preps.  The custom protocols can generate usable data for inputs as low as 10 ng total and also work with degraded RNA samples.  Custom protocols do require additional labor.  Please inquire with us with a description of your samples.

Latest Tweets