FAQ

← FAQ

04 Library Preparation and QC

The HiSeq 4000 sequencer is the most demanding Illumina sequencer with regards to library insert sizes.  Nevertheless, the majority of existing Illumina sequencing libraries can be sequenced as is on the HiSeq 4000:
The libraries should not have any or no visible adapter dimers and the library fragments should be mostly shorter than 670 bases).
Other sequencing libraries can be made compatible by size-selection (removing both adapter-dimer traces and fragments of more than 670 bases, if the latter are numerous).
We suggest removing library fragments longer than 670 bases libraries for Hiseq 4000 sequencing if there are considerable amounts of these.  The size selection does not need to be “perfect”.
The size selection can be achieved by carrying out a simple “upper cut” with Ampure beads (please see the protocol below), by doing a BluePippin size selection, or by manual gel extraction.   We offer the first two of these options.
If your libraries are homogeneously sized, you could pool the libraries and carry out the size selection only once with the pool.
Ampure XP bead “upper cut” protocol to remove fragments longer than 670 bases:
Please note: It is recommended to verify this protocol first with your batch of Ampure XP beads or similar beads from other manufacturers.   This selection protocol will also remove adapter dimers, if they are not dominating the library.
Bead-based size selection cannot carry out precise “cuts”;  thus, you will also lose some of the library in the size ranges that you intend to keep.
      1. If not mentioned explicitly follow the standard Ampure XP handling instructions from the manufacturer (e.g. equilibrate the beads at to room temperature before use; vortex beads before use, details of the bead washes and elution,…)
      2. If the sample volume is smaller than 50 ul, add EB buffer up to 50 ul to each sample.
      3.  Add 0.55x the sample volume in Ampure beads (e.g. 27.5 ul beads to a 50 ul sample) to your sample, mix, incubate for 5 minutes at RT.
      4. Collect the beads on a magnet.
      5. Transfer the supernatant to a new tube.
      6. Add another 1x original volume Ampure beads to the supernatant; mix; incubate for 5 minutes
      7. Collect the beads on a magnet and remove the supernatant
      8. Carry out the two regular 80% ethanol washes of the beads and elute the samples from the beads according to Agencourt Ampure XP protocol.
      9. Verify the success of the size selection by running an aliquot on a Bioanalyzer or equivalent instrument.
Ampure XP bead “upper & lower cut” protocol to remove fragments longer than 670 bases and shorter than 400 bases:

This protocol is identical to the one above but adds a smaller volume of the beads at step 6 for the final enrichment onto the beads. The reduced bead buffer concentration at this step leads to a removal of longer fragments compared to the protocol above.
Please note: It is recommended to verify this protocol first with your batch of Ampure XP beads or similar beads from other manufacturers.  Bead-based size selection cannot carry out precise “cuts”;  thus, you will also lose some of the library in the size ranges that you intend to keep.

  1. If not mentioned explicitly follow the standard Ampure XP handling instructions from the manufacturer (e.g. equilibrate the beads at to room temperature before use; vortex beads before use, details of the bead washes and elution,…)
  2. If the sample volume is smaller than 50 ul, add EB buffer up to 50 ul to each sample.
  3.  Add 0.55x the sample volume in Ampure beads (e.g. 27.5 ul beads to a 50 ul sample) to your sample, mix, incubate for 5 minutes at RT.
  4. Collect the beads on a magnet.
  5. Transfer the supernatant to a new tube.
  6. Add another 0.25x of the original volume Ampure beads (e.g. 12.5 ul beads for a sample of a 50 ul starting volume sample) to the supernatant; mix; incubate for 5 minutes
  7. Collect the beads on a magnet and remove the supernatant
  8. Carry out the two regular 80% ethanol washes of the beads and elute the samples from the beads according to Agencourt Ampure XP protocol.
  9. Verify the success of the size selection by running an aliquot on a Bioanalyzer or equivalent instrument.

Beckmann/Agencourt also sells beads that are dedicated to size selections named SPRIselect — however, very likely these are actually identical to the AMpure XP beads. The SPRIselect manual provides a lot of additional information and protocols that can be applied to AMpure XP and other beads. Please see here: Beckman SPRIselect Ampure beads

BTW, our favorite magnetic separator for 96-well plates is this one from EdgeBio.


Is PCR-free library preparation still advantageous?
In general, the original concerns about library PCR amplification (presented in papers from 2008) are no longer very relevant.  This is due to the use of modern polymerases that are designed for complex samples like Kapa HiFi,  NEB Q5, or QIAseq HiFi polymerase.  The previous “standard”, the high-fidelity Phusion enzyme had tremendous disadvantages for complex samples (Quail et al. 2012 Optimal enzymes for amplifying sequencing libraries. Nature Methods volume 9, pages10–11(2012)  https://www.nature.com/articles/nmeth.1814 ).

PCR-free libraries also have disadvantages, since they require significantly higher library QC efforts. Thus, we are charging a PCR-free Add-On fee for the preparation of PCR-free libraries.

What are your recommendations?
A great alternative to preparing the libraries completely PCR-free is the use of a single PCR cycle instead. This combines the advantages of both: It creates fully double-stranded library molecules that do not cause any problems in the library QC. In addition there will be no or only an extremely low PCR-bias introduced. Our recommendation is to submit the same amount of  DNA sample as for PCR-free library preps (e.g. 1 ug) and then request the single PCR cycle library amplification.


Quality and quantity of DNA and RNA is critical for high quality sequencing output. Please make sure your DNA is not degraded and is free of RNA contamination. RNA samples should always be assessed on the bioanalyzer for the absence of gDNA contamination (can be removed with DNaseI treatment followed by a column clean-up; e.g. Zymo “RNA Clean and Concentrator”) and degradation. Preferentially determine the concentrations of your DNA and RNA samples using fluorometry (e.g. with a Qubit or plate reader). The sample purity should be assessed by spectrophotometry (e.g. Nanodrop).  Please see this page for a comprehensive table of sample requirements for sample QC, library preps, or your self-made libraries. Please see the Library Prep Page for details on the library prep processes.  For submission information, including submission forms and shipping details, please visit the Sample Submission & Scheduling page. If you are submitting DNA for PacBio libraries, please follow the PacBio Guidelines for Shipping and Handling.
The  Real-time PCR core can carry out DNA as well as RNA extractions for you.


3’Tag-Seq is a protocol to generate low-cost and low-noise gene expression profiling data.   The protocol is also known as  TagSeq, 3’Tag RNA-Seq, Digital RNA-seq, Quant-Seq (please note that most of these names have also been used for a variety of other protocols previously). In contrast to traditional RNA-Seq, which generates sequencing libraries from the whole transcripts, 3-Tag-Seq only generates a single initial library molecule per transcript, complementary to 3′-end sequences. For example for human samples, the restriction to a small part of the transcripts reduces the number of sequencing reads required by at least five times. In contrast to earlier “digital RNA-seq” protocols that were based on restriction digestions of cDNAs, the current protocol combines reverse transcription priming from the poly-A tail with random priming and adapter placement for the second-strand synthesis. In most cases up to 48 samples can be sequenced per HiSeq 4000 lane.

More than 90% of the RNA-seq studies carried out in our labs are analyzed exclusively for differential gene expression (DGE). The conventional full transcript RNA-seq protocols generate more data than needed for this specific purpose, but they also allow for splicing analyses. The complexity of the standard RNA-seq data is not an advantage if the aim of the project is only DGE analysis – 3’Tag-Seq might actually be the superior tool for this application (DGE). In our experience the 3’Tag-Seq data have so far shown exceptionally low noise as well as insensitivity to RNA sample quality variations.

This example MDS plot shows an analysis of 3’Tag-Seq data of macrophage cells exposed to three types of bacterial infections and mock-infections at two time points. The analysis distinguishes the responses to the individual bacterial species and the duration of the infections. Even the reactions to the mock-infections are clustered by time points.

MDSplot

We are currently offering 3’Tag-Seq as a low cost custom sequencing service but are planning to offer 3’Tag-Seq services soon at simple per-sample recharge rates — including both library preps and sequencing. In the long run the services can also include a basic differential-gene-expression analysis.

Advantages of 3’Tag-Seq:

  • low noise gene expression profiling
  • less sensitive to RNA sample quality/integrity variations (compared to poly-A enrichment protocols)
  • >99% strand-specific; same direction as mRNA transcripts
  • requires significantly lower numbers of sequencing reads
  • single read sequencing is sufficient
  • simpler library prep protocol
  • costs about half or less compared to standard RNA-seq
  • costs lower than, or comparable to, microarray analysis
  • much higher dynamic range compared to microarrays
  • we routinely sequence 48 libraries per HiSeq lane; for soBarclays
  • for very low input or high depth sequencing of 3’Tag-Seq libraries UMI‘s (unique modular identifiers) can be incorporated
  • Batch-Tag-Seq packages: simple pricing scheme and simplified planning of experiments

Disdavantages of 3’Tag-Seq:

  • data analysis requires a reference genome with good annotation (including UTRs)
  • only applicable to eukaryotic samples
  • data do not contain any transcript-splicing information
  • protocol is (a bit) more sensitive to chemical contaminants (spin column cleaned RNA samples are recommended)

For high-throughput 3’Tag-Seq library generation we require pure total RNA samples at a concentration of 100 ng/ul (best submit 10 ul at 100 ng/ul).  For custom 3’-Tag-Seq library preps the input amounts can be a low as 10 ng total. The RNA samples for this protocol need to be isolated or cleaned-up by spin-column protocols. Please also see the sample requirements page.
3′-Tag-Seq libraries are sequenced by single-end sequencing on the HiSeq 4000 or the NextSeq.

Please note that 3’Tag-Seq libraries generate lower read numbers on the HiSeq 4000 (about 320 million reads per lane) compared to standard RNA-seq libraries. Since the DGE analysis of tag-Seq data requires much lower read numbers this is usually not a problem.

The libraries will be sequenced on Illumina HiSeq 4000 or NextSeq 500 sequencers with single-end 80 or 90 bp reads (SE80 or SE90).  Please note that for some analysis pipelines it is recommended to trim off the first 12 bases from the reads.  We will provide the full length data.  Trimming is not necessary if you are using a local aligner (like STAR or BBmap). The sequences can be trimmed easily, for example with the “reformat” command from BBTools.  In case UMIs are incorporated, the first 6 bases of the forward read represent the UMI, followed by a common linker with the sequence “TATA”, followed by the 12bp random priming sequence.    It is recommended to transfer the UMI sequence information to the read header and trim the first 22 bases from each read with UMI-TOOLS or custom scripts. The same software can be used to remove PCR-duplicates after the alignments.

Please also the 3’Tag-Seq data analysis recommendations and this note on working with degraded RNA samples.

 

Tag-Seq reads on mRNA2_700x261

 

Comprehensive Batch-Tag-Seq info for download with UC pricing.


Primer and adapter-dimer contamination in sequencing libraries can lead to serious problems like barcode switching (also called barcode hopping).  Thus, these short molecules should be removed from the libraries as soon as traces of them become visible on the Bioanalyzer or equivalent. Please note that the Bioanalyzer uses double-strand-specific fluorescent dyes which have a very low affinity to single-stranded primers.  Thus, the Bioanalyzer assay severely underestimates the true concentration of free primer molecules.

  • Low concentrations of free primers and adapter-dimers can be removed with a bead cleanup e.g.  adding  1x the original volume in Ampure XP beads (or equivalent).
  • The more stringent option for primer removal is an Exonuclease VII single-strand digest  (e.g. with this  ExoVII) at 37C for 20 minutes using 1 ul enzyme and the accompanying buffer; followed by a bead cleanup with 1.6 x the original volume in Ampure XP beads. This will not remove primer-dimers.

PCR amplified sequencing libraries frequently display library molecules seemingly about twice the excepted size or even bigger.  In most cases, this phenomenon is caused by over-amplification of the libraries.  These PCR artifacts do occur in cases the PCR reactions run out of essential reagents – in most cases the PCR primers will be exhausted.  If primers are no longer available the PCR products will anneal to each other  (the sequencing adapter sequence will be the by far most common sequences available). The resulting annealing products are often called “PCR-bubbles” and are partly double-stranded and partly single-stranded; thus they migrate considerably slower on agarose gels as well as on Bioanalyzer assays.  Please see below.

Since these artifacts are merely annealing products, the resulting libraries are perfectly sequence-able.  However, the quantification of such libraries by fluorometry will not be precise since the dyes used for these measurements are specific for double-stranded DNA molecules and PCR bubbles contain considerable amounts of single-stranded DNA that will not be measured.   The PCR bubbles can be removed by amplifying the library one more time with a single cycle of PCR (a so-called “Reconditioning PCR“).   For this PCR you could use standard Illumina P5 (AATGATACGGCGACCACCGAGATCT) and P7 (CAAGCAGAAGACGGCATACGAGAT) primers.  The complementary sequences should be located at the very ends of all Illumina sequencing library molecules.  In most cases PCR bubble artifacts can not be removed by SPRI bead size selections or Blue Pippin size selections; if necessary, a “Reconditioning PCR” is the best option.
However, to avoid unnecessary complexity loss of the library and introduction of polymerase errors, it would be best to optimize the library preparation protocol for a lower number of PCR cycles beforehand.

pcr_bubble

 

Another graphic illustrating PCR bubbles (source Illumina Inc.). Please also see: https://support.illumina.com/bulletins/2019/10/bubble-products-in-sequencing-libraries–causes–identification-.html


Illumina sequencers using the patterned flowcell technology (HiSeq 4000, HiSeq X Ten, NovaSeq, iSeq) can show an increased rate of barcode switching events.  These artifacts are enabled by the exclusion amplification chemistry used on these sequencers; IIlumina calls the artifacts “index hopping”.  Please see the index hopping information from Illumina here and in this video from Illumina. We have also have posted additional information here.

To best avoid read mis-assignments two measures are required:

The use of UDI adapters is highly recommended when sequencing on the HiSeq 4000, and especially on the NovaSeq.  Hereby matching i5 and i7 indices per library must be avoided. UDI adapters can also be used on the MiSeq and the NextSeq, but they do not offer any significant advantages on these sequencers that employ the bridge-amplification chemistry. Further, any traces of free primers, primer-dimers, and adapter dimers should be removed from the sequencing libraries or the pools.

Both the Nextseq 500 and the Miseq do NOT require UDI adapters; for these single-indices and combinatorial indexing are fine.

Commercial sources of UDI adapters: TruSeq-style uniquely indexed adapters are available from both Illumina and BiooScientific.  Qiagen, NEB, and NuGEN are also supporting their library prep kits with optional UDI adapters.  Nextera style indices need to be custom ordered from oligo vendors.

The DNA Technologies Core has 96-plex UDI adapter sets in stock that can be added to sequencing libraries by PCR.  These barcode sets are available for both Nextera and TruSeq adapter designs.  Please note that for TruSeq style libraries one will need to ligate a shortened and index-less stub-adapter instead of a standard Illumina adapter.  The indices are then added after the cleanup of the ligation reaction by PCR.

Sequencing libraries prepared by the DNA Technologies Core:
The DNA Technologies Core uses UDI adapters for all library prep protocols that are compatible with dual indexing (e.g. DNA-Seq, RNA-seq, 3′-Tag-Seg, WGBS-Seq, …).  The Core also makes sure to remove any traces of free primers, primer-dimers, and adapter-dimers.


If we prepare the sequencing libraries we require ChIP-seq DNA samples to be submitted after reversal of the cross-linking. Ideally, the fragment lengths should be between 100 and 300 bp, and preferably under 500 bp. The former will result in the tightest peaks.
For ChIP-seq it is common to start with DNA samples with concentrations too low to measure. Otherwise the general DNA sample recommendations apply (buffer should be EB buffer or EBT: http://dnatech.genomecenter.ucdavis.edu/illumina-library-construction/) although more sample should be supplied if available. We highly recommend submitting the samples in low-retention tubes (e.g. Eppendorf LoBind).
General ChIP-seq recommendations:
  • The fragment lengths should be consistent and best be between 100 and 300 bp (up to 400 bp for the majority of molecules is acceptable).  Consistent fragment lengths can best be achieved on a Covaris style closed tube sonicator. We recommend avoiding probe sonicators.
  • Please make sure to run the input controls on a Bioanalyzer or agarose gel beforehand, and email us an image of these.
  • Sequence one “input control” per cell line/sample type.
  • Analyze at least two biological replicates.
  • We highly recommend verifying the enrichment of your regions of interest (e.g. promoter regions) vs. the control samples by qPCR, before submitting the samples for sequencing.
  • For highest accuracy data we can now generate sequencing libraries with UMI-bearing sequencing adapters. UMIs (Unique Molecular Identifiers) allow the accurate detection and removal of PCR duplicate reads. This approach is especially recommended for low input samples. The first nine bases of the forward and reverse reads will contain UMI sequences.

The required read number per sample will vary from target to target. For the study of point source transcription factors the ENCODE project recommends analyzing at least 20 million (uniquely mapping) reads (http://genome.cshlp.org/content/22/9/1813.long#boxed-text-2). Depending on the quality of your preps, perhaps 75% of the reads can be expected to be uniquely mapping. ENCODE tends to err on the high side with their recommendations. Thus about 20 million reads per sample should be acceptable, but this is likely the minimum number.

Zhang et al. 2016 have studied the impact of the sequencing run types on ChIP-seq data analysis. Their data indicate that paired-end sequencing data provide significant advantages of single-end sequencing in ChIP-seq.

References:
Landt et al. 2012: ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Research 22: 1813-1831
Bailey et al. 2013 Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data.  PLOS Computational Biology https://doi.org/10.1371/journal.pcbi.1003326
Zhang et al. 2016: Systematic evaluation of the impact of ChIP-seq read designs on genome coverage, peak identification, and allele-specific binding detection. BMC Bioinformatics volume 17, Article number: 96

MACS — Model-based Analysis of ChIP-Seq https://taoliu.github.io/MACS/
https://hbctraining.github.io/Intro-to-ChIPseq/lessons/05_peak_calling_macs.html


We have currently 96 indices are available and can pool 96 RNA-seq or genomic sequencing libraries. Bioo Scientific offers NEXTflex barocde sets allowing the pooling of up to 384 libraries. If you are planning to use homebrew versions of indices please consult with us first, as reduced complexity from incorrectly designed indices may cause failures when sequencing your sample.


If you have access to fluorometric DNA quantification and a Bioanalyzer (or equivalent), library pooling is not difficult. We offer the pooling of sequencing libraries for a small fee. For sequencing libraries generated by the Core, pooling is included in the library preparation service.
Prerequisites for the pooling of customer libraries are: 

  • all libraries were generated using the same protocol and are PCR amplified
  • the library fragment sizes have to be similar for all libraries * (and within Illumina specs) as demonstrated by Bioanalyzer traces (or gel images if correct balancing is not that critical)
  • have uniquely indexed adapters
  • all libraries have DNA concentrations in the same range
  • PCR-amplified libraries can be quantified based on fluorometric measurements (e.g. Qubit), but PCR-free libraries are best quantified by qPCR.

Library pooling requires precise pipetting of very small volumes. Even with the best libraries, there will be some imbalances. However, and we can’t work magic with variably sized samples.
* The clustering efficiency of Illumina sequencing libraries varies with the fragment lengths.  Shorter molecules are more mobile and will always cluster preferentially compared to longer molecules (smaller molecules will “win the race” to the flowcell surface oligos).
Thus,
accurate pooling is impossible when combining libraries of varying lengths and we can’t vouch for the results. In some cases, it is advisable to size-select the libraries stringently before quantification and pooling.

PCR-amplified libraries can be quantified by fluorometry (e.g. Qubit), but PCR-free libraries are best quantified by qPCR.  For sequencing libraries generated by the Core, pooling is included in the library preparation service.

We suggest the following procedure when pooling libraries yourself:

  • verify that Bioanalyzer traces of your libraries show the same fragment size distribution
  • quantify each library by fluorometry (Qubit or plate reader)
  • if necessary dilute some of the highly concentrated libraries (to bring them in line with the others)
  • re-quantify the newly diluted libraries (Qubit)
  • pool the same amounts for each library i.e., the same number of femtomoles for each library; the femtomole amount is calculated by multiplying the concentration (in nM) by volume (in ul)
  • quantify the resulting pool by Qubit to verify that it has the expected concentration (we will quantify once more by qPCR before sequencing)

Please note that the combined library concentration of the pool should be 5 nM or higher; this means that the concentration of individual libraries in the pool can and will be considerably lower.

You can get trained and use a Qubit in our lab: http://dnatech.genomecenter.ucdavis.edu/qubit-fluorometer/

 


Strand-Specific RNA-Seq Libraries

RNA-Seq (conventional) after Poly-A enrichment or ribodepletion:
By default we generate strand-specific RNA-seq libraries. Strand-specific (also known as stranded or directional) RNA-seq libraries substantially enhance the value of an RNA-seq experiment. They add information on the originating strand and thus can precisely delineate the boundaries of transcripts in regions with genes on opposite strands.
There are several ways to accomplish strand-specificity.  We incorporate dUTP during the second-stand synthesis of the cDNA.  The dUTP containing strand will not be amplified by the archaea polymerase used for library amplification, thus preserving the strand information for RNA-seq.  

For single-end sequencing, the resulting data will represent the “anti-sense strand”.   When using paired-end sequencing, the forward read of the resulting sequencing data represents the “anti-sense strand” and the reverse read the “sense strand” of the genes (for Trinity transcriptome assemblies the “–RF” orientation flag should be used). Illumina paired-end reads are always inward oriented (with the exception of “jumping” or  “mate-pair” libraries).

Tag-Seq:
Tag-seq data are strand-specific and have  a “sense-strand” orientation.


Illumina sequencing libraries are usually generated with Y-adapters. These are partly single-stranded and partly double stranded.
A PCR-free library will thus still contain partly single-stranded regions.  These single-stranded regions can lead to several types of Bioanalyzer artifacts.  Most commonly the libraries will appear about 70 to 100 nucleotides longer than expected.  However, we have also encountered PCR-free libraries that ran as shorter molecules as well as dramatically longer molecules.  We have (very rarely) encountered another significant problem: considerable amounts adapter-dimers were not visible on the Bioanalyzer traces of PCR-free libraries.

To accurately QC PCR-free Illumina libraries we recommend the following approach:
–  Take a 1 ul aliquot of your library and run a short PCR (e.g. 6 cycles) with this aliquot.
–  Clean up the PCR reaction with a spin column ( e.g. Qiagen Qiaquick, Zymo DNA -clean, …); do NOT use Ampure beads.
–  Run the cleaned up PCR product on the Bioanalyzer again as well as the original PCR-free library.
The Bioanalyzer trace of the PCR product will represent the true molecule sizes and the true adapter-dimer content the closest.

Is PCR-free library preparation still advantageous?
If we generate PCR-free libraries in our lab, the described additional QC steps for PCR-free libraries will necessitate significant additional costs for the library preparation. Please see the prices for the PCR-free Add-on.
A great alternative to preparing the libraries completely PCR-free is the use of a single PCR cycle instead. This combines the advantages of both: fully double-stranded library molecules can be used for the library QC and there will be no or only an extremely low PCR-bias introduced. Our recommendation is to submit the same amount of  DNA sample as for PCR-free library preps (1 ug or more) and then only apply the single cycle of amplification.

In general, the original grave concerns about library PCR amplification (presented in papers from 2011) are no longer very relevant.  This is due to the use of modern polymerases that are designed for complex samples like Kapa HiFi,  NEB Q5, or QIAseq HiFi polymerase.  The previous “standard”, the high-fidelity Phusion enzyme had tremendous disadvantages for complex samples (Quail et al. 2012 Optimal enzymes for amplifying sequencing libraries. Nature Methods volume 9, pages10–11(2012)  https://www.nature.com/articles/nmeth.1814 ).


← FAQ

Latest Tweets
  • Writing a great paper may be not enough? Producing an accompanying youtube video might help? An new scRNA-seq prot… ,
  • One of many CCGP reference genomes to become available in the next months - based on work from our long-read sequen… ,
  • Please join our Single-Cell Lunch & Learn with TaKaRa next Wednesday, 06-15 12.30 pm to 1.30 pm GBSF auditorium As… ,
  • Hello tweeps, please think of all the poor people who couldn't make it there. There surely have to be so… ,