Genome
Center

Core Home

Find Us

Contact Us

Links

Log On

Prices

Bioinformatics

Equipment



Illumina Sequencing: All About Libraries

Updated 12/6/11

Returning Users Sample Drop-Off Form Here: Pdf or doc

Go back to the main sequencing page

Overview

Starting material for Illumina library construction is usually double stranded (ds) DNA from any source: genomic DNA, BACs, PCR amplicons, ChIP samples, any type of RNA turned into ds DNA (mRNA, normalized total RNA, smRNAs), etc. Pretty much anything you can think of that ends up as, or can be turned into, dsDNA. This dsDNA is then fragmented (if it is not already, as in ChIP or many cDNA applications), the ends are polished and 'A' tailed, adapters are ligated on, size selection carried out, then PCR performed to generate the final library ready for sequencing. Different library types might vary in the details (such as PCR-free library) but this is the basic workflow.

Libraries: Core Service
Libraries: Make Your Own
Libraries: Quality Control
Libraries: Submission and Turnaround Time

Libraries: Core Service

DNA Based Libraries. Performance specifications of libraries we produce depend on the source material.  Genomic DNA, double strand cDNA libraries, BACs, or other material available in microgram quantities will generate quality libraries nearly every time.

Guidelines for Submission of Library-Worthy DNA. Provide from 1 to 5 ug of high quality DNA (concentration > 100 ng/ul, OD 260/280 close to 1.8) in a neutral solution such as EB, TE ([EDTA] =0.1 mM) or water. Library construction can also be attempted from much less input material, with attendant caveats.

ChIP Libraries. We offer library construction from chromatin immunoprecipitated material. For these more complex experiments, discussions with core personnel regarding suitability of starting material and construction strategy are recommended. No guarantees are offered with this library service, other than we'll do our best! For general background, the ChIP-Seq Data Technical Note and ChIP-Seq DataSheet from Illumina may be of interest.

RNA Based Libraries. Something of a misnomer because all the libraries end up as DNA, but this refers to the starting material. We offer mRNA-seq library preparation, with a number of options as described below.

Guidelines for Submission of Library-Worthy RNA.
  Provide up to 10 ug of total RNA at a concentration of at least 200 ng/ul (using less starting material is possible, although we have not gone below 1-2 ug of starting input using the standard mRNA-seq protocol so can't guarantee results). We use the Illumina mRNA-seq kit - you can download and take a look at our modified protocol for these preps. Normalized mRNA library construction is a new service offered by the Core (more technical details on this appear below), but the most important point about these libraries is that only 50-200 ng of total RNA is required.

Indexed Libraries allow for the sequencing of multiple libraries in a single lane, i.e., multiplexing. Indexing can be coupled with any type of library prep, it just depends on using particular adapters during the library preparation process. Indexing, sometimes called bar-coding, is a useful strategy when the typical lane output of 20-30 million reads from the GAIIx or 120-150 million reads from the HiSeq is greater than required for a single library (e.g., in sequencing BACs, PCR generated fragments, small microbial genomes, transcriptomes or certain ChIP applications). With the HiSeq, indexing is pretty much the norm for the most situations.  The principle is that short nucleotide "bar codes" are appended to each library using specific adapters containing those sequences. Libraries containing different indexed adapters are then constructed, quantified, pooled in equimolar amounts, and sequenced.  Deconvoluting the bar codes informatically allows multiple libraries to be sequenced in a single lane at a potential cost and time saving.  To date, two methods have been exploited for this:  using the Illumina indexing kits (TruSeq or Nextera) or synthesizing your own adapter oligos with your own bar codes.  With the Illumina TruSeq v2 Library Prep Kits A and B you can use up to 24 different barcodes to multiplex up to 24 libraries. The Nextera kit (from Epicentre, now part of Illumina) uses dual indexing and transposon mediated fragmentation ('tagmentation') followed by PCR amplification to integrate barcoded adaptors (so a PCR-free library is not an option using the Nextera kit). The dual indexing/adapter tagging strategy (with up to 12 indices available for adapter 1 and up to eight indices for adapter 2) permits up to 96 unique dual index combinations.

Homemade indexing has been used successfully by multiple users, but it is important to ensure that the base composition of the indices are balanced to optimize the ability of the image analysis software to identify clusters correctly. That is, the clusters should have a roughly equal representation of A,T,C, and G over the first four cycles.  This is especially important for MiSeq runs, unless sufficient PhiX is spiked in (>30%) to create a more diverse set of clusters.

More On RNA Libraries: Normalized vs. Regular, Other Construction Options

In RNA library normalization, abundant RNAs are reduced from the total sequenced pool through biochemical techniques.  This process is useful when gene space or SNP detection sequencing projects are the focus of the sequencing project, as opposed to quantitative descriptions of mRNA abundance.  Protocols for this are rapidly evolving and Illumina has released a validated normalization protocol used in conjunction with their mRNA seq kits. This procedure does not rely on a polyA purification so can be used on microbial RNAs as well as total RNA (i.e., the abundant rRNA sequences are removed during the nuclease treatment due to their rapid reannealing). You can download this DSN normalization protocol. Results from Illumina indicate that normalized RNA samples still maintain the regulatory profiles of un-normalized samples. There are a number of companies that offer various kits for RNA amplification and other treatment procedures used upstream of the standard library prep protocol e.g., NuGEN's Ovation RNA-Seq System. These kits broaden the utility and applications of RNA sequencing by allowing the use of small amounts of material, as well as bypassing the need for a polyA tail to be present on the material to be sequenced.

What Else Do We Need From You
?

As described in the Sample Submission section, the sample drop-off forms contain fields to fill in important information about library construction parameters. One thing we need to know is the approximate insert size desired. In the absence of specific preferences, we recommend about 200-300 nts for most mRNA and DNA libraries. However, it is important to consult the bioinformaticists on your project to make sure this is OK. For certain applications, such as ChIP-seq, a range of sizes may be desired and we can accomodate that. But again, we strongly recommend verifying the suitability of these values for the experiment you are trying to do.

Not Yet Available, But May Be Depending On Demand

We do not yet offer small or micro RNA library construction services specifically, but can run these libraries on our sequencers if primer is provided. We don't yet make mate pair libraries, useful in de novo assembly projects. We have not yet attempted construction of strand specific libraries or libraries made from enriched or hybrid-selected sequences (more on hybrid selection and targeted sequencing below). We can't make libraries in high throughput fashion. All of these procedures are under discussion and development so we believe it is only a matter of time before we can offer this to our sequencing community.
Hybridization Selection/Sequence Capture Llibraries are those in which particular genomic regions are pre-enriched prior to indexed library generation and sequencing.  This strategy allows focused, very deep sequencing and can be implemented for a number of applications. Check out the Illumina publications site to view articles using this strategy.  Numerous companies offer services or platforms that can generate such material, including Illumina, RainDance, Agilent, NimbleGen, Febit, and Fluidigm.   Technical information on the RainDance, Agilent, Nimblegen, and Qiagen (which uses a PCR based, not hybridization capture strategy, for enrichment) systems are available but not guaranteed to be up to date. Think of them as a starting point for further investigation (we have the contact info for company reps if needed) and solely informational (no implied endorsement etc.).

Libraries: Make Your Own

You can do it! Library construction involves DNA fragmentation (if necessary, depending on the nature of the initial sample), enzymatic treatment of the DNA to repair and A-tail the fragments, ligation of sequencing adapters to these fragments, then subsequent PCR amplification, with or without size selection depending on the application. See below for more information on these various aspects of library construction.

Fragmentation

DNA to be made into a sequencing library must first be converted into small fragments. There are several methods for doing this, each with attendant pros and cons.  The original protocols used nebulization for fragmentation, and anyone desiring to pursue this technique can contact us for extra nebulization units because we abandoned this method very early.  A related methodolgy, Hydroshear, can be accessed through the CAES Genomics facility.  We primarily use a Diagenode Bioruptor because we are familiar with its operation through using it in chromatin immunoprecipitation experiments.  Access to this instrument is available through the Core, with the usual training and signup guidelines for Core-available equipment in effect.  Many protocols and centers rely on and recommend a fragmentation device from Covaris, which uses adaptive focused acoustics to break the DNA into appropriately sized fragments.  The Core does not have imminent plans to acquire this device since in our opinion it does not offer a substantial advantage over the Bioruptor. Another approach is the use of an enzymatic fragmentation product from New England Biolabs called 'Fragmentase'. Two advantages of this product are that it lends itself very well to high throughput construction of libraries, and that controlled digestion can be used to achieve gentle fragmentation of genomic DNA in order to recover very high MW fragments (e.g., 3-10 kb).  Large MW fragments are important in the construction of mate pair libraries, and these can be difficult to produce using other methods.  Results from a protocol-optimization experiment can be downloaded here (courtesy of Marta Matvienko).

Basic DNA and RNA Library Protocols

We have been using library kits from NEB as a source of the fragment repair, tailing, and amplification enzymes. In addition to the kit used in the above protocol, there are a number of other next-gen related products available through NEB. You can find out more information about NEB products here (we don't receive revenue if you click this link in case you were wondering). The protocol we have used in workshops, and is also currently in use within the core, can be downloaded here.

For mRNA-seq libraries we are currently using the standard Illumina kit. New products are out there and we encourage you to do research if RNA-seq libraries are of interest. In particular, normalization protocols from Illumina that integrate with their kit, and novel RNA amplification and library production resources from NuGen, have expanded the services we offer and will no doubt continue to do so. In other words, keep checking this site to see how things evolve. Meanwhile, here is our version of the illumina mRNA-seq protocol with certain modifications. Many of our clients utilize this kit with excellent results, so let us know if this is of interest and we can provide ordering information (price is 2,000 for 8 reactions).

The Illumina Oligonucleotides

Illumina sells the oligonucleotides required for library construction separately, so they can be purchased and used in conjunction with other library preparation reagents. Many users have also successfuly ordered these oligos from commercial sources. A highly unscientific survey indicated good results can be obtained using oligos from different sources and purified to varying degrees. The oligonucleotide sequences are available from the excellent sequencing forum seqanswers.com. Two things to note--the "top" adapter, starting with GATC, must be phosphorylated, and the bottom adapter can be synthesized with a special linkage between the 3' terminal T and the preceding C. This is a phosphorothioate linkage which renders this overhanging T (after annealing the top and bottom adapter oligos) more nuclease resistant. This provides nuclease resistance for this base, diminishing the probability of adapter dimers (more on adapter dimers below).

Other resources to check out

The latest versions of Illumina validated protocols for all their kits are at your fingertips:
Go to http://www.illumina.com/ftp.ilmn
Log on with
      Username: guest
      Password: illumina
Select the folder(s) of interest, in this case "Genome Analyzers."

An excellent forum for sequence related questions of all kinds on all platforms is the Seqanswers forum. It can be reached at seqanswers.com.

Libraries: Quality Control

bioanalyzer traceLibrary quality is the single most important determinant of the success of your sequencing run, both in terms of the number of reads generated (quantitation) and the validity of the sequence obtained (content).   Methods for construction and analysis continue to evolve; while now somewhat dated a useful early paper by Quail et al. from the Sanger Inst. lists a number of improvements over the standard Illumina protocols in library preparation and analysis.  If you construct your own libraries you may want to download this paper and a supplementary methods table for the many practical issues covered. We strongly recommend two qc measures be carried out on all libraries run in our core--examination on the Agilent Bioanalyzer and quantitation using the Kapa Biosystems library quant kit.

The Bioanalyzer provides a nice visual examination of the libraries. The "perfect" library electropherogram, pictured above, shows a single peak of the expected molecular weight. Common additional forms include primer dimers (at around 80-85 nts), adapter dimers (around 125 nts), and broader bands of higher MW than the expected peak. Primer dimers, minimized by the use of magnetic beads, are not a problem unless they completely dominate the reaction. We have gotten good results from libraries comprised of 50% primer dimers. Adapter dimers can be a problem because they will sequence. As a result, whatever the proportion of adapter dimers in your library will be seen as that percentage of reads in your final data files. Adapter dimers can be minimized by adjusting the adapter:insert ratios during library construction and exercising care in gel extraction or other size selection steps. The larger MW, typically more hump shaped forms that are visualized on the bioanalyzer are probably a result of excess amplification during the final PCR step. While some amount of these are tolerable, if they are too prominent then the library should be re-amplified from the gel extracted material.

We use a qPCR assay for library quantitation. There have been homemade variants of these around for years, including one we used to run and had posted protocols for. Lately we have been using the Kapa Biosytems qPCR assay on all libraries we run. This has allowed us to provide much more consistent cluster values, which translates to more consistent read numbers. For long runs in particular it's essential to maximize the data recovered given the time and money involved, which is why we recommend this quantification so strongly. Current prices are under 70 dollars for both assays, if you can provide us with your own bioanalyzer (or Bio-Rad Experion) traces we can just do the qPCR for half the price.

It can be worth the effort to assess library content before spending the money on the sequencing and bioinformatic analysis. We feel that the Illumina-recommended validation steps of cloning and (Sanger) sequencing a representative sampling of library molecules, with some exceptions, are not sufficiently informative. The exception is when libraries are made using unusual procedures and/or oligonucleotides; in these situations making sure the expected configuration or sequences are recovered is useful. qPCR tests of various kinds can be carried out on the library to ensure that, for example, the amount of ribosomal RNA is reduced to appropriate levels during a normalization procedure, or that libraries made from ChIP maintain the enrichment of the target promoter seen in the original material (this is actually highly recommended prior to sequencing).


Libraries: Submission and Turnaround Time

All sample drop offs must be accompanied by our submission form, you can obtain and submit either a pdf or doc version.  Use the same form for submitting material to be made into libraries as well as for libraries ready for sequencing.  Please contact us if you have any questions about the required information,  because we need everything filled out. This minimizes the chance for error on these expensive and time consuming experiments.

Once your library, or library raw material, is ready, you should deliver it as soon as possible to get the next available slot in line.  Runs occur as we fill up the seven lanes on a flow cell, and the timing on this can vary depending on service type and core activity. As a result, turnaround time for any sample is difficult to predict very far in advance.  We can generally give some idea when your sample is delivered.   Two things are certain:  (a) the sooner you drop off a library, the sooner you will get data back, and (b) we will stay in contact and let you know the status of your project.

The minimum amount of library we can use is 12 ul of a 1 nM stock. This gives us 2 ul to use for bioanalyzer and qPCR quality control, then enough left to dilute for the flow cell. As mentioned, it is difficult to accurately quantify libraries. If your sample reads at least 5 ng/ul on the nanodrop, it is likely greater than 1 nM concentration. Below that it becomes more questionable and our qPCR assay will be required to give an accurate value.


To ship libraries, use the following courier address:

Dr. Ryan Kim
Rm 4212 GBSF
451 Health Sciences Dr.
Univ. CA-Davis
Davis, CA 95616

For libraries and other dsDNA shipments wet ice or cold paks are usually sufficient. For RNA dry ice is recommended. We don't favor particular shippers but of course FedEx is the main courier around here.