DNA Sequencing — Illumina library construction services
As sequencing output increases and experimental scales are growing, generating libraries for sequencing is often the rate-limiting step. Our lab employs liquid handling robots to minimize sample handling variation and to provide fast turnaround times. Our library preparation services include library QC, library quantification, and library pooling. All our libraries are barcoded (single indexed or uniquely-dual-indexed).
We are happy to discuss the options and protocols suitable for your specific research projects. We prepare standard sequencing libraries as well as many specialized libraries and can also help you with custom projects. At the moment we are offering:
- Genomic DNA libraries (whole-genome shotgun libraries)
- Targeted sequencing: amplicon sequencing, exome capture sequencing, SPE sequencing
- Methyl-Seq: WGBS (Whole Genome Bisulfite Seq) and RRBS (Reduced Representation Bisulfite Seq)
- ChIP-Seq, reduced-representation sequencing, & custom projects
- High-Throughput (HT) library preps
- Mate-Pair libraries
- 10X-Genomics Chromium Genome (“linked-read”) libraries
- Reduced Representation libraries (e.g. for Genotyping By Sequencing [GBS] applications)
- PCR-free libraries
- BluePippin/PippinHT library size selection
- Custom libraries ( e.g. Tn-Seq)
We are running automated library preparation systems that can generate up to 96 barcoded libraries using the Perkin Elmer Sciclone NGS G3 robots for consistent quality and rapid turnaround. We can also provide training and access to the robots if you wish to use the instruments yourself for large-scale projects. Please see this page for details on sample packaging and shipping.
DNA/RNA Sample Quantification and Purity
- All samples have to be accompanied by appropriate sample QC documentation (e.g. Bioanalyzer traces or agarose gel electrophoresis images) and nanodrop 260/280nm and 260/230nm absorption ratio measurements . It speeds up the projects if this QC can be carried out as early as possible in your lab. However we are happy to run the sample QC for you, for a fee.
- DNA samples need to be RNA-free (check on agarose gel) and dissolved in EB buffer or TLE buffer (please see bottom of the page). Molecular biology grade water is also OK.
The input DNA and RNA quantities specified below and on this page apply if the samples are quantified by a fluorometric method (e.g. Qubit, PicoGreen, RiboGreen). Fluorometry provides advantages in precision and specificity (e.g. DNA dyes will not bind to/measure RNA). If a spectrophotometer (e.g. Nanodrop) is used, we suggest submitting twice the requested amount of sample since this type of measurement is often unreliable. In any case, sample amounts higher than the minimum requirements will improve the library complexity. Spectrometer readings are very useful to assess the purity of samples. For DNA samples the 260/280 ratio should be between 1.8 to 2.0 and the 260/230 ratio should be higher than 2.0. For RNA samples the 260/280 ratio should be between 1.8 and 2.1 and 260/230 ratio should be higher than 1.5. Values outside of these ranges indicate contamination. The Real-time PCR Core can carry out DNA as well as RNA extractions.
Starting material for Illumina library construction can be double stranded (ds) DNA from any source: genomic DNA, BACs, PCR amplicons, ChIP samples, any type of RNA turned into ds cDNA (mRNA, normalized total RNA, smRNAs), etc. Pretty much anything you can think of that ends up as, or can be turned into, dsDNA. This dsDNA is then fragmented (if it is not already, as in ChIP). The average fragment length should not exceed 600 bp (MiSeq) or 400 bp (HiSeq 4000, NovaSeq). The ends are repaired and ‘A’ tailed, adapters are ligated on, size selection is carried out, then PCR performed to generate the final library ready for sequencing. Different library types might vary in the details (such as PCR-free library).
Please see the Comprehensive Sample Requirements Page and consult our FAQs for technical questions
High-Throughput Library Preps (HT-Processing)
We are offering high-throughput sequencing library preparations for genomic DNA library preps as well as RNA-seq library preps.
HT processing implies that the samples are submitted ready-to-use and that we can’t pamper individual samples (e.g. purify or concentrate it). In short the suitability of the sample is the responsibility of the lab submitting the samples.
- All samples for the high-throughput library preparations (at reduced HT rates) need to be normalized to +- 20% of the average sample concentration.
- Please make sure that all your samples do fulfill the sample purity requirements and sample concentration library requirements.
- The HT processing rates do not include re-dos of library preps that fail due to sample inadequacies.
DNA Sequencing Libraries
Guidelines for Submission of Library-Worthy DNA
Provide 1 ug or more of high quality DNA (concentration > 50 ng/ul, OD 260/280 close to 1.8; 260/230 ratio >2.0) in EB or TE buffer (EB buffer preferred), or molecular biology grade water. Library construction can also be attempted from less input material, with caveats. For PCR-free libraries sample amounts of 2 ug DNA are recommended; working with less is possible.
DNA samples need to be RNA-free . On an ethidiumbromide stained agarose gel RNA contamination will be visible as a halo-like smear in the range of 50 to 200 bp. Please submit an agarose gel image of the DNA samples together with the samples.
Whole Genome Shotgun Libraries (HT library preparations)
We employ physical shearing using the Covaris focused sonicator for a random fragmentation of the DNA samples to a narrow size range suitable for Illumina library preparations. Libraries generated with such ultrasound shearing results in the most even genome coverage. Please aim to provide 1 ug DNA per sample. The latest library preparation protocols allow library preparation from nanogram quantities. However such super low inputs do have consequences and might not be suitable for your project. Please discuss low input projects with us beforehand. Samples need to be normalized to +- 20% of the average sample concentration.
NEW Pooled Super-High-Throughput Shotgun Libraries — Pooled SHT Libraries
This a very low-cost library preparation protocol that generates for a complete 96-well plate of samples. Up to 960 libraries can be pooled and sequenced together with paired-end 150 bp reads. Each read does contain an inline barcode identifying the locations on the plate and a second barcode identifying the plate. Software for de-multiplexing is available. As part of this process, the first 20 bases of the forward read should be trimmed off. We can do this processing for you. This protocol is ideal for example for Skim-Seq applications.
All samples of a plate will be pooled after the first step of the library preparation, which already adds the sample specific barcode indices. The final libraries are only available as a pool for the 96 samples. Thus, it is important that the all samples are in a similar condition. This means the DNA samples should have been isolated with the same protocol and be of high purity throughout. The DNA sample concentration has to be normalized for the entire plate before sample submission. Please submit 20 ul of each DNA sample in EB buffer in a securely sealed 96-well plate. The DNA concentration should best be 25 ng/ul for each sample measured by fluorometry (Qubit, Quantus, PicoGreen on plate reader). For high quality DNA samples a concentration of 12,5 ng/ul will be sufficient.
We offer library construction from chromatin immunoprecipitated material. For these more complex experiments, discussions with Core personnel regarding suitability of starting material and construction strategy are recommended. No guarantees are offered with this library service, other than we’ll do our best! The ChIP-Seq Data Technical Note and ChIP-Seq DataSheet from Illumina provide some background information.
- Please make sure to run the input controls on the bioanalyzer or on an agarose gel beforehand and email us an image of these.
- You will also want to sequence one input control per cell line/ sample type.
- It is further recommended to verify the enrichment of your regions of interest (e.g. promoter regions) vs. the control samples, before submitting the samples for sequencing by qPCR.
The required read-number per sample will vary from target to target. For the studying point source transcription factors the ENCODE project recommends to analyze at last 20 million (uniquely mapping) reads ( http://genome.cshlp.org/content/22/9/1813.long#boxed-text-2 ). Depending on the quality of your preps perhaps 75% of the reads can be expected to be uniquely mapping. ENCODE tends to err on the high side with their recommendations. Thus, about 20 million reads per sample should be acceptable but this is likely the minimum number.
Reduced Representation Sequencing
RR-sequencing methods are using restriction enzyme digestions to generate reproducible subsets of sequences that are distributed throughout the genome: reduced representations. In most cases reduced representations of about 1% or 2% of the genome size are sequenced and analyzed for SNP markers. Please see this review for a discussion of methods and applications. We are using the methylation sensitive ApeKI enzyme which, for plant genomes, allows to exclude most repeat sequences from the sequencing. We are prepping libraries for 95 genomic DNA samples simultaneously.
In case your species of interest has not been studied by RR-sequencing before, we need to establish that the ApeKI enzyme is suitable. Before shipping us samples please email us gel images of representative samples before shipping any samples. The gel image should show intact DNA samples as well as the same samples digested with a non-methylation sensitive restriction enzyme (e.g. HindIII). Agarose gel electrophoresis could be carried out on 1% gels with Ethidium bromide or GelRed staining. If the samples digest well, we would then do a further test experiment over here. To see if ApeKI is appropriate we need to carry out test digestions, adapter ligations, and PCR amplification with some test samples (submit about 5 ug at >50 ng/ul ; this test will cost $75). Upon a successful test we require the RRS samples to be submitted in a 96-well plate. One or two wells should remain empty for negative controls. The concentration of the samples should be normalized to 50 ng/ul as assayed by an intercalating dye (fluorometry on Qubit or plate reader). The UV absorption ratios should be 260/280nm 1.8 to 2.0 and 260/230 > 2.0 . A volume of 20 ul per sample is sufficient. The DNA samples have to be extracted using a CTAB-free protocol (best a spin column protocol), since very precise DNA sample quantification is critical for the success of the protocol. The libraries are typically sequenced on a single HiSeq 4000 lane with single-end 90 bp reads.
Whole Genome Bisulfite Shotgun sequencing: Bisulfite conversion is a method to distinguish between methylated and unmethylated cytosines in genomic DNA. Unmethylated cytosines are deaminated to uracils which, during amplification, are converted to thymidines, altering the sequence. 5-methylcytosine (5mC) is the best studied epigenetic mark in Eukaryotes often indicating gene silencing. We are employing post bisulfite conversion WGBS library prep protocols that are highly efficient allowing low inputs of less than 100 ng genomic DNA. The resulting libraries can be sequenced with paired-end sequencing on the NovaSeq. The resulting libraries are typically sequenced to depth of less than 1x (Crary-Dooley et al. 2017) to 50x genome coverage depending on the degree and the precision of the changes in methylation you want to detect.
For custom projects it is also possible to further distinguish between 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC). The latter is considered an intermediate of a demethylation process or an epigenetic mark indicating activation (in contrast to the effects of 5mC). Oxidative bisulfite conversion (and oxBS-Seq) is a protocol to distinguish between the two methylcytosine forms. It requires to double the sequencing effort.
Reduced Representation Bisulfite Sequencing: This protocol employs the MspI restriction enzyme (target sequence 5’CCGG3’) to generate libraries highly enriched for CpG island and CpG shore regions. The libraries represent about 1% of the genome but still contain the majority of promoter regions – the regions of highest biological relevance for epigenetic studies. The reduced representation libraries are bisulfite converted with the same chemistry described above (WGBS). This allows sequencing of the CpG island regions to high coverage and the detection of even small methylation rate changes.
We employ an efficient RRBS protocol for which six samples (later 12 samples) are pooled after the ligation of barcoded adapters. The majority of the library prep process is carried out on the pooled samples avoiding sample specific technical biases. Typically the pool of 12 libraries is sequenced on a single-end Hiseq 4000 lane. Up to 4 million CpGs are covered in the human genome per sample.
Mate Pair Libraries (please note: for the vast majority of projects PacBio or Nanopore sequencing data are superior to mate pair libraries)
The sequencing of Mate Pair libraries generates long-insert paired-end reads. The libraries are generated by self-ligation of long DNA fragments and labeling of the junction sites to generate chimeric library molecules that bring together sequences that were originally 2kb to 12 kb apart. We are using the Illumina Nextera Mate Pair kit which employs a transposase enzyme to fragment as well as end-tag the DNA in a single step. The tags are biotinylated and thus allow for the selection of junction sites containing fragments. In contrast to older mate pair library protocols, the Nextera kit is very reliable with the exception of the sizing of the initial fragments. As with all other long DNA fragment analyses, the DNA quality matters. Please email us an gel-image before submitting the DNA samples. The samples should run as a band of 20kb size or longer on agarose gels.
The Nextera kit offers two protocols: the “gel-free” version (1 ug input), which is mostly of interest when only little input DNA is available. The sizes of the mates-pair fragments from this protocol usually range from 1.5 kb to 10kb. Surprisingly the SSPACE scaffolder can still work with these data.
The “gel-plus” version requires a minimum of 4 ug input DNA (and 4 times the reagents) and uses gel extractions to size select fragments within a range of +- 700 bp for shorter mates and within +- 2kb for longer mates of up to 10 to 12 kb. Due to the uncertainties of the fragmentation please submit at least twice the amount of sample.
In theory the fragment sizes resulting from the tagmentation are only dependent on the input DNA amount. In praxis the fragment lengths vary considerably between different DNA samples of similar amounts. This variability between samples can be observed even after precise DNA quantification by fluorometry. The reactions are tune-able for aliquots of the same sample, though. Especially if very specific size ranges are desired it is often necessary to repeat the tagmentation reaction with adjusted DNA amounts. We might then combine similar sized gel extraction fractions from two tagmentation reactions to generate libraries of high complexity for the desired size ranges. Please let us know how important specific insert size ranges are for your project.
Because of the difficulties to predict the fragment size ranges, we are quoting mate pair recharge rates including two tagmentation reactions. If we can generate the desired library with a single tagmentation, we charge the lower rate of the one-tagmentation library prep.
Numerous companies provide services and platforms that generate whole exome or target amplification. We offer the Fluidigm Access Array, which employs nanofluidics for cost-effective target selection to generate barcoded amplicon libraries that are ready for Illumina sequencing. Sequence Capture Libraries are those in which particular genomic regions are enriched after indexed library generation and sequencing. This strategy allows focused, very deep sequencing and can be implemented for a number of applications. Several companies offer platforms that can generate such material, including Illumina, RainDance, Agilent, NimbleGen, and Fluidigm. Technical information on the Agilent, Nimblegen, RainDance, and Qiagen (which uses a PCR based, not hybridization capture strategy, for enrichment) systems are available but not guaranteed to be up to date. Think of them as a starting point for further investigation (we have the contact info for company reps if needed) and solely informational (no implied endorsement etc.).
Other Library Considerations
Library Indexing and Pooling
Indexing, also called barcoding, allows for the sequencing of multiple libraries in a single lane, i.e., multiplexing. By default all libraries generated by us have a barcode. Multiplexing is required when the typical lane output of 15-25 million reads from the MiSeq, 300-400 million reads from the HiSeq 4000 is greater than required for a single library (e.g., in sequencing BACs, PCR generated fragments, small microbial genomes, transcriptomes, exome, ChIP, and small RNA applications). Multiplexing is also the best way to minimize potential lane-to-lane sequencing variation, as all of your samples are subject to the same sequencing conditions. For example, if you require two sequencing lanes for six samples we recommend 6-plexing and sequencing over two lanes, instead of 3-plexing per lane. The principle is that short nucleotide “barcodes” are appended to each library using specific adapters containing those sequences. Libraries containing different indexed adapters are then constructed, quantified, pooled in equimolar amounts, and sequenced. Deconvoluting the barcodes informatically allows multiple libraries to be sequenced in a single lane at a potential cost and time saving. To date, two methods have been exploited for this: using the commercially available indexing kits (Illumina TruSeq, Nextera, or Bioo Scientific) or synthesizing your own adapter oligos with your own barcodes. Bioo Scientific offers Illumina-compatible barcodes (NEXTflex) with up to 384 barcodes. The Nextera kit (Epicentre/Illumina) uses dual indexing and transposon mediated fragmentation (‘tagmentation’) followed by PCR amplification to integrate barcoded adapters (so a PCR-free library is not an option using the Nextera kit). The dual indexing/adapter tagging strategy (with up to 12 indices available for adapter 1 and up to eight indices for adapter 2) permits up to 96 unique dual index combinations.
When sequencing on the HiSeq 4000 and especially on the NovaSeq it is highly recommended to use uniquely-dual-indexed adapters (UDI adapters) to avoid index hopping artifacts. Please see this FAQ.
Common buffers for DNA and RNA samples are:
Molecular Biology grade water (RNAse free, but not DEPC treated)
EB-Buffer: 10mM TRIS (pH= 8.0-8.4) – e.g. Qiagen EB Buffer – This is the preferred buffer for DNA samples!
EBT-Buffer: 10mM TRIS, 0,1%Tween20 (pH=8.0-8.4)
TE-Buffer: 10mM TRIS, 1 mM EDTA (pH=8.0-8.4) – Please avoid TE buffer!
TLE-Buffer: 10mM TRIS, 0.1 mM EDTA (pH=8.0-8.4)