DNA Sequencing — Illumina library construction services
As sequencing output increases and experimental scales grow, generating libraries for sequencing is often the rate-limiting step. The Core employs liquid handling robots to minimize sample handling variation and to provide fast turnaround times. Our library preparation services include library QC, library quantification, and library pooling. All our libraries are barcoded (single indexed or uniquely-dual-indexed).
We are happy to discuss the options and protocols suitable for your specific research projects. We prepare standard sequencing libraries as well as many specialized libraries, and can also help you with custom projects. We currently offer:
- Genomic DNA libraries (whole-genome shotgun libraries)
- Targeted sequencing: amplicon sequencing, exome capture sequencing, SPE sequencing
- Methyl-Seq: WGBS (Whole Genome Bisulfite Seq) and RRBS (Reduced Representation Bisulfite Seq)
- ChIP-Seq, reduced-representation sequencing, & custom projects
- High-Throughput (HT) library preps
- Mate-Pair libraries
- 10X-Genomics Chromium Genome (“linked-read”) libraries
- Reduced Representation libraries (e.g. for Genotyping By Sequencing [GBS] applications)
- PCR-free libraries
- BluePippin/PippinHT library size selection
- Custom libraries ( e.g. Tn-Seq)
We are running automated library preparation systems that can generate up to 96 barcoded libraries using the Perkin Elmer Sciclone NGS G3 robots for consistent quality and rapid turnaround. We can also provide training and access to the robots if you wish to use the instruments yourself for large-scale projects.
DNA/RNA Sample Quantification and Purity
- All samples have to be accompanied by appropriate sample QC documentation (e.g. Bioanalyzer traces or agarose gel electrophoresis images) and Nanodrop 260/280 nm and 260/230 nm absorption ratio measurements. It speeds up the process if QC can be carried out as early as possible in your lab. However we are happy to run the sample QC for you, for a fee.
- DNA samples need to be RNA-free (check on an agarose gel) and dissolved in EB buffer, TLE buffer (see bottom of the page for buffer compositions), or molecular biology grade water.
The input DNA and RNA quantities specified below and on this page apply if the samples are quantified by a fluorometric method (e.g. Qubit, PicoGreen, RiboGreen). Fluorometry provides advantages in precision and specificity (e.g. DNA dyes will not bind to/measure RNA). If a spectrophotometer (e.g. Nanodrop) is used, we suggest submitting twice the requested amount of sample, since this type of measurement is often unreliable. In any case, sample amounts higher than the minimum requirements will improve the library complexity. Spectrometer readings are very useful to assess the purity of samples. For DNA samples the 260/280 ratio should be between 1.8 to 2.0 and the 260/230 ratio should be higher than 2.0. For RNA samples the 260/280 ratio should be between 1.8 and 2.1 and 260/230 ratio should be higher than 1.5. Values outside of these ranges indicate contamination. The Real-time PCR Core offers DNA and RNA extraction services.
Starting material for Illumina library construction can be double-stranded (ds) DNA from any source: genomic DNA, BACs, PCR amplicons, ChIP samples, any type of RNA turned into ds cDNA (mRNA, normalized total RNA, smRNAs), etc. This dsDNA is then fragmented (if it is not already, as in ChIP). The average fragment length should not exceed 600 bp (MiSeq) or 400 bp (HiSeq 4000, NovaSeq). The ends are repaired and ‘A’ tailed, adapters are ligated on, size selection is carried out, then PCR performed to generate the final library ready for for QC and sequencing. Different library types might vary in the details (such as PCR-free library).
Please see the Comprehensive Sample Requirements Page and consult our FAQs for technical questions
High-Throughput Library Preps (HT-Processing)
We offer high-throughput sequencing library preparations for genomic DNA library preps, as well as for RNA-seq library preps.
HT processing implies that the samples are submitted ready-to-use and that we can’t pamper individual samples (e.g. purify or concentrate them). In short, the suitability of the sample is the responsibility of the lab submitting the samples.
- All samples for the high-throughput library preparations (at reduced HT rates) need to be normalized to +- 20% of the average sample concentration.
- Please make sure that all your samples fulfill the sample purity requirements and sample concentration library requirements.
- The HT processing rates do not include re-dos of library preps that fail due to sample inadequacies.
DNA Sequencing Libraries
Guidelines for Submission of Library-Worthy DNA
Provide 1 ug or more of high quality DNA (concentration > 50 ng/ul, OD 260/280 close to 1.8; 260/230 ratio >2.0) in EB or TE buffer (EB buffer preferred), or molecular biology grade water. Library construction can also be attempted from less input material, with caveats. For PCR-free libraries sample amounts of 2 ug DNA are recommended; working with less is possible.
DNA samples need to be RNA-free. On an ethidium bromide stained agarose gel, RNA contamination will be visible as a halo-like smear in the range of 50 to 200 bp. Please submit an agarose gel image of the DNA together with the samples. DNA samples for Illumina sequencing should be isolated with spin-column protocols (e.g. DNeasy; multiple vendors offer similar kits). Such kits are also available in plate format for high-throughput processing.
Whole Genome Shotgun Libraries (HT library preparations)
We employ the Covaris focused sonicator to physically fragment the DNA to a narrow size range suitable for Illumina library preparations. Libraries generated with such ultrasound shearing result in the most even genome coverage. Please aim to provide 1 ug DNA per sample. The latest library preparation protocols allow library preparation from nanogram quantities. However, such super low inputs do have consequences and might not be suitable for your project; please discuss low input projects with us beforehand. Samples need to be normalized to +- 20% of the average sample concentration.
NEW: Pooled Super-High-Throughput Shotgun Libraries (SHT)
This very low-cost library preparation protocol generates libraries for a complete 96-well plate of samples and is ideal for Skim-Seq applications, for example. Up to 960 libraries (10 plates) can be pooled and sequenced together with paired-end 150 bp reads. Each read contains an inline barcode identifying the locations on the plate, with a second barcode identifying the plate. Software for de-multiplexing is available. As part of this process the first 20 bases of the forward read and the first 8 bases of the reverse read should be trimmed; thus the resulting effective genome coverage will be about 10% less than normal (we can do this processing for you).
The first step of library preparation adds a sample-specific barcode to each sample on a plate, then all samples are pooled for the next steps, meaning the final libraries are only available as a pool of 96 samples. Thus, it is critically important that all samples are in a similar condition – isolated with the same protocol, of high purity overall, and normalized before sample submission. The DNA samples should be isolated with spin column protocols (CTAB-free). Please submit 20 ul of each DNA sample in EB buffer in a securely sealed 96-well plate. The DNA concentration should ideally be 25 ng/ul for each sample measured by fluorometry (Qubit, Quantus, PicoGreen on plate reader), although for high-quality DNA a concentration of 12.5 ng/ul will be sufficient. Please see above for required sample purity, UV absorbance, metrics.
We offer library construction from chromatin immunoprecipitated material. For these more complex experiments, discussions with Core personnel regarding suitability of starting material and construction strategy are recommended. No guarantees are offered with this library service, other than we’ll do our best! The ChIP-Seq Data Technical Note and ChIP-Seq DataSheet from Illumina provide some background information.
- Please make sure to run the input controls on a Bioanalyzer or agarose gel beforehand and email us an image of these.
- You will also want to sequence one input control per cell line/sample type.
- We highly recommend using qPCR to verify the enrichment of your regions of interest e.g. promoter regions vs. the control samples, before submitting the samples for sequencing.
The required read number per sample will vary from target to target. For studying point source transcription factors the ENCODE project recommends analyzing at last 20 million (uniquely mapping) reads (http://genome.cshlp.org/content/22/9/1813.long#boxed-text-2). Depending on the quality of your preps perhaps 75% of the reads can be expected to map uniquely. ENCODE tends to err on the high side with their recommendations. Thus, about 20 million reads per sample should be acceptable, but this is likely the minimum number.
Reduced Representation Sequencing
RR-sequencing methods use restriction enzyme digestions to generate reproducible subsets of sequences that are distributed throughout the genome i.e. reduced representations. In most cases reduced representations of about 1% or 2% of the genome size are sequenced and analyzed for SNP markers. Please see this review for a discussion of methods and applications. We use the methylation sensitive ApeKI enzyme which, for plant genomes, allows exclusion of most repeat sequences from the sequencing. We prepare libraries for 95 genomic DNA samples simultaneously.
If your species of interest has not previously been studied by RR-sequencing we first need to establish that the ApeKI enzyme is suitable. Please email us gel images (1% agarose gel electrophoresis with Ethidium bromide or GelRed staining) of representative samples before shipping any samples. The gel image should show intact DNA samples, as well as the same samples digested with a non-methylation sensitive restriction enzyme (e.g. HindIII). If the samples digest well, submit test samples (~5 ug at >50 ng/ul) so we can do additional tests to see if ApeKI is appropriate (test digestions, adapter ligations, $75). Upon a successful test we require RR samples to be submitted in a 96-well plate. One or two wells should be left empty for negative controls. Sample concentration should be normalized to 50 ng/ul as assayed by an intercalating dye (fluorometry on Qubit or plate reader). The UV absorption ratios should be 260/280nm 1.8 to 2.0, and 260/230 > 2.0. A volume of 20 ul per sample is sufficient. The DNA samples need to be extracted using a CTAB-free protocol (always use spin column protocols), since very precise DNA sample quantification is critical for the success of the protocol. The libraries are typically sequenced on a single HiSeq 4000 lane with single-read 90 bp reads.
Whole Genome Bisulfite Shotgun sequencing: Bisulfite conversion is a method to distinguish between methylated and unmethylated cytosines in genomic DNA. Unmethylated cytosines are deaminated to uracils which, during amplification, are converted to thymidines, altering the sequence. 5-methylcytosine (5mC) is the best studied epigenetic mark in Eukaryotes often indicating gene silencing. We are employing post bisulfite conversion WGBS library prep protocols that are highly efficient allowing low inputs of less than 100 ng genomic DNA. The resulting libraries can be sequenced with paired-end sequencing on the NovaSeq. The resulting libraries are typically sequenced to depth of less than 1x (Crary-Dooley et al. 2017) to 50x genome coverage depending on the degree and the precision of the changes in methylation you want to detect.
For custom projects it is also possible to further distinguish between 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC). The latter is considered an intermediate of a demethylation process or an epigenetic mark indicating activation (in contrast to the effects of 5mC). Oxidative bisulfite conversion (and oxBS-Seq) is a protocol to distinguish between the two methylcytosine forms. It requires to double the sequencing effort.
Reduced Representation Bisulfite Sequencing: This protocol employs the MspI restriction enzyme (target sequence 5’CCGG3’) to generate libraries highly enriched for CpG island and CpG shore regions. The libraries represent about 1% of the genome but still contain the majority of promoter regions – the regions of highest biological relevance for epigenetic studies. The reduced representation libraries are bisulfite converted with the same chemistry described above (WGBS). This allows sequencing of the CpG island regions to high coverage and the detection of even small methylation rate changes.
We employ an efficient RRBS protocol for which six samples (later 12 samples) are pooled after the ligation of barcoded adapters. The majority of the library prep process is carried out on the pooled samples avoiding sample specific technical biases. Typically the pool of 12 libraries is sequenced on a single-end Hiseq 4000 lane. Up to 4 million CpGs are covered in the human genome per sample.
Mate Pair Libraries (please note: for the vast majority of projects PacBio or Nanopore sequencing data are superior to mate pair libraries)
The sequencing of Mate Pair libraries generates long-insert paired-end reads. The libraries are generated by self-ligation of long DNA fragments and labeling of the junction sites to generate chimeric library molecules that bring together sequences that were originally 2kb to 12 kb apart. We are using the Illumina Nextera Mate Pair kit which employs a transposase enzyme to fragment as well as end-tag the DNA in a single step. The tags are biotinylated and thus allow for the selection of junction sites containing fragments. In contrast to older mate pair library protocols, the Nextera kit is very reliable with the exception of the sizing of the initial fragments. As with all other long DNA fragment analyses, the DNA quality matters. Please email us an gel-image before submitting the DNA samples. The samples should run as a band of 20kb size or longer on agarose gels.
The Nextera kit offers two protocols: the “gel-free” version (1 ug input), which is mostly of interest when only little input DNA is available. The sizes of the mates-pair fragments from this protocol usually range from 1.5 kb to 10kb. Surprisingly the SSPACE scaffolder can still work with these data.
The “gel-plus” version requires a minimum of 4 ug input DNA (and 4 times the reagents) and uses gel extractions to size select fragments within a range of +- 700 bp for shorter mates and within +- 2kb for longer mates of up to 10 to 12 kb. Due to the uncertainties of the fragmentation please submit at least twice the amount of sample.
In theory the fragment sizes resulting from the tagmentation are only dependent on the input DNA amount. In praxis the fragment lengths vary considerably between different DNA samples of similar amounts. This variability between samples can be observed even after precise DNA quantification by fluorometry. The reactions are tune-able for aliquots of the same sample, though. Especially if very specific size ranges are desired it is often necessary to repeat the tagmentation reaction with adjusted DNA amounts. We might then combine similar sized gel extraction fractions from two tagmentation reactions to generate libraries of high complexity for the desired size ranges. Please let us know how important specific insert size ranges are for your project.
Because of the difficulties to predict the fragment size ranges, we are quoting mate pair recharge rates including two tagmentation reactions. If we can generate the desired library with a single tagmentation, we charge the lower rate of the one-tagmentation library prep.
Numerous companies provide services and platforms that generate whole exome or target amplification. We offer the Fluidigm Access Array, which employs nanofluidics for cost-effective target selection to generate barcoded amplicon libraries that are ready for Illumina sequencing. Sequence Capture Libraries are those in which particular genomic regions are enriched after indexed library generation and sequencing. This strategy allows focused, very deep sequencing and can be implemented for a number of applications. Several companies offer platforms that can generate such material, including Illumina, RainDance, Agilent, NimbleGen, and Fluidigm. Technical information on the Agilent, Nimblegen, RainDance, and Qiagen (which uses a PCR based, not hybridization capture strategy, for enrichment) systems are available. We use baits from IDT’s xGen Exome Research Panel v1.0 combined with regular Illumina library prep.
Other Library Considerations
Library Indexing and Pooling
Indexing, also called barcoding, allows for the sequencing of multiple libraries in a single lane, i.e., multiplexing. By default all libraries generated by us have a barcode. Multiplexing is required when the typical lane output of 15-25 million reads from the MiSeq, 300-400 million reads from the HiSeq 4000 is greater than required for a single library (e.g., in sequencing BACs, PCR generated fragments, small microbial genomes, transcriptomes, exome, ChIP, and small RNA applications). Multiplexing is also the best way to minimize potential lane-to-lane sequencing variation, as all of your samples are subject to the same sequencing conditions. For example, if you require two sequencing lanes for six samples we recommend 6-plexing and sequencing over two lanes, instead of 3-plexing per lane. The principle is that short nucleotide “barcodes” are appended to each library using specific adapters containing those sequences. Libraries containing different indexed adapters are then constructed, quantified, pooled in equimolar amounts, and sequenced. Deconvoluting the barcodes informatically allows multiple libraries to be sequenced in a single lane at a potential cost and time saving. To date, two methods have been exploited for this: using the commercially available indexing kits (Illumina TruSeq, Nextera, or Bioo Scientific) or synthesizing your own adapter oligos with your own barcodes. Bioo Scientific offers Illumina-compatible barcodes (NEXTflex) with up to 384 barcodes. The Nextera kit (Epicentre/Illumina) uses dual indexing and transposon mediated fragmentation (‘tagmentation’) followed by PCR amplification to integrate barcoded adapters (so a PCR-free library is not an option using the Nextera kit). The dual indexing/adapter tagging strategy (with up to 12 indices available for adapter 1 and up to eight indices for adapter 2) permits up to 96 unique dual index combinations.
When sequencing on the HiSeq 4000 and especially on the NovaSeq it is highly recommended to use uniquely-dual-indexed adapters (UDI adapters) to avoid index hopping artifacts. Please see this FAQ.
Common buffers for DNA and RNA samples are:
Molecular Biology grade water (RNAse free, but not DEPC treated)
EB-Buffer: 10mM TRIS (pH= 8.0-8.4) – e.g. Qiagen EB Buffer – This is the preferred buffer for DNA samples!
EBT-Buffer: 10mM TRIS, 0,1%Tween20 (pH=8.0-8.4)
TE-Buffer: 10mM TRIS, 1 mM EDTA (pH=8.0-8.4) – Please avoid TE buffer!
TLE-Buffer: 10mM TRIS, 0.1 mM EDTA (pH=8.0-8.4)