PacBio Library Prep & Sequencing

Please also see the: Introduction to PacBio Sequencing and General Information

PacBio Sequel Sequencing: Services and Performance

We offer complete PacBio library prep, BluePippin size selection and sequencing services. The base calling and secondary analyses like Reads-Of-Insert extraction, HGAP assembly, and IsoSeq analyses of single or a small number of SMRT-cells are included in the service.  We will provide you the complete data set generated by the PacBio sequencer and SMRT-LINK analyses for download form our servers.  The Bioinformatics Core is offering in depth analyses of PacBio sequence data.  We now  offer HMW-DNA isolation as a service.  Due to being a “single-molecule sequencing” technology, the PacBio sequence data quality and yields, will depend highly on the quality of the DNA and RNA samples.

The PacBio Sequel Sequencer

The Sequel is the second generation PacBio sequencer.  It now generates up to 14x more sequencing data per SMRT-cell compared to the RSII. The read lengths metrics have fully caught up to its predecessor often surpassing them. The Sequel chemistry is still undergoing rapid development and the read lengths continue to increase.  The principles of the single molecule sequencing chemistry are unchanged. The sequence coverage and the sequencing errors do not show any detectable sequence-specific biases.  Pacific Biosciences has demonstrated a high quality genome assembly for Arabidopis based on the data of only two Sequel SMRT-cells.

The new PacBio chemistry (V3) enables High-Fidelity Long Reads: The chemistry significantly increases the average lifespan of the polymerase leading to significantly longer reads which can be used to sequence molecules of 10 kb and 15 kb length with 99.9% accuracy.  The high-fidelity long read data are calculated from  circular-consensus sequencing (CCS) data making use of 20 hour sequencing runs and the hairpin adapters: Sequencing 4 passes of library molecule are expected to yield Q20 data and 9 passes Q30 data (99.9% accuracy).  Please see these slides for more details. The V3 chemistry:

  • Doubles throughput per SMRT-cell from ~10 Gb to ~20 Gb (depending on insert sizes)
  • Increases the average polymerase read length from ~20kb to ~30 kb (for inserts > 20 kb)
  • Increases the polymerase read length further for shorter insert libraries yielding up to 50 Gb yields and average polymerase read length up to 100kb per SMRT-cell.
  • 20 Hour Movies:  Due to the improved polymerase performance it is now suggested to sequence amplicons over 3kb length, Iso-Seq libraries, and of course high-fidelity long read projects (10 to 15kb inserts) with 20 hour movies instead of the standard 10 hour movies.  The gains of the doubled run time for longest insert libraries (>30 kb) are marginal at the moment, but will become interesting with the V2 Express-library kit in 2019.
  • Please see  the information from PacBio and these slides for more details.

High-fidelity long read quality score data from PacBio (Q20 equals 99% accuracy; Q30 equals 99.9% accuracy):

HF-LongReadMetrics

PacBio advises to run 3 of 4 titration SMRT-cell to optimize the loading of the Sequel which would obviously very expensive. We find that titration on one cell is now sufficient for most libraries.
Typical Sequel run times are 10 hours per SMRT-cell.  For long-amplicon, Iso-Seq sequencing, and high-fidelity long reads the run times can be extended to 20 hours using special and higher-priced SMRT-cells.
Unlike other technologies, PacBio sequencing adapters have a hairpin structure.  This allows the repeated circular sequencing of both strands;  for example accumulating 20 reads of a particular library molecule. Read errors can be very efficiently removed from the sequences via Circular-Consensus-Sequencing (CCS) analysis, resulting in highest sequencing data quality currently possible. In contrast to other technologies, it has been shown that the PacBio sequencing chemistry is not sensitive to extreme GC contents. Even long GCC repeat stretches can be sequenced.

Most Common Applications for PacBio Sequencing

Whole Genome Shotgun Sequencing:  Due to the long read lengths (read lengths N50 values can exceed 26 kb) and due to the random nature of the PacBio sequencing errors, PacBio sequencing is the technology enabling the highest quality de novo genome assemblies as a standalone approach.  Typically a genome coverage of about 60X is used for de novo assemblies.  This is a generalization and experiment will need to be adjusted  based on heterozygosity, ploidy, DNA quality, repeat content, etc. . The PacBio data are similarly used to identify structure genomic variants.
Given high quality DNA samples and long insert libraries, mean polymerase-read-lengths of up to15 kb can be expected and mean yields of around 8 Gb per SMRT-cell can be expected after titration.  For highest quality genomic DNA samples we do now see average yields of more than 10 Gb, even for plant samples. Subreads can reach lengths of up to 60 kb and subread N50 values can reach up to 26 kb.  For bacterial genome sequencing the PacBio data  can be analyzed for DNA methylation (the m6A and m4C modifications in prokaryotic DNA).

Iso-Seq (full length transcript RNA-Seq):     The PacBio long reads enable sequencing of full-length transcripts up to 10 kb (essentially all transcripts), thus eliminating the need for error-prone transcript assembly from short treads. The Iso-Seq bioinformatics pipeline processes the data into high-quality consensus transcript sequences enabling accurate isoform annotation and open reading frame prediction.  These features do make Iso-Seq the method of choice for example for de novo gene annotations.  The Sequel chemistry significantly simplifies the Iso-Seq protocol compared to the first generation PacBio sequencer.  Iso-Seq libraries can be barcoded and pooled at different stages of the library prep (after the reverse transcription; after PCR-amplification of cDNAs; after PacBio library prep).  Please note that the transcript lengths can vary significantly between tissues and sample types. Thus, the rad numbers per sample cannot be predicted accurately and the read numbers can vary significantly between pooled Iso-Seq libraries. Polling the samples earlier in the library prep process will result in lower costs but also in higher variation of read numbers.

Long Amplicon Sequencing:   Please see our FAQ for details.  Amplicons of up to 12 kb length can be sequenced for highest quality CCS data.  We offer two sets of  universal barcodes PCR primer sets (12×12 and 24×24 indices) that you can pick up in the lab (for a fee). These allow the pooling of up to 576 amplicon samples.

The high-fidelity long read approach (10kb to 15 kb CCS sequencing with 20 hour runs) enables the analysis of metagenomes, population samples, as well as the assembly of polyploid or highly heterozygous genomes.

 

Sample Requirements, Sample Submission and Scheduling

PacBio library prep requires microgram DNA sample amounts. The recommended amount of input DNA further correlates with the desired read length – see these PacBio guidelines for SMRTbell libraries for details. Since the PacBio technology interrogates single molecules, any defect (e.g. a nick, an abasic site, a DNA adduct) can interfere with the sequencing process. Thus, the integrity and purity of the DNA sample is of utmost importance. The DNA quality and the DNA amount will determine which library insert sizes are feasible and how many SMRT-cells can be sequenced. The DNA samples should fulfill these criteria:

  • Minimal DNA purity: OD 260/280 should be 1.8-2.0; OD 260/230 should be >2.0
  • Has undergone a minimum of freeze-thaw cycles.
  • Has not been exposed to high temps (> 65°C for more than one hour can cause a detectable decrease in sequence quality).
  • Has not been exposed to pH extremes (< 6 or > 9).
  • Does not contain insoluble material.
  • Is RNA-free.
  • Has not been exposed to intercalating fluorescent dyes or ultraviolet radiation.
  • Does not contain, divalent metal cations (e.g., Mg2+), denaturants (e.g., guanidinium salts, phenol), or detergents (e.g., heme, humic acid, polyphenols). Amplicon samples should be submitted in EB buffer (EDTA-free).
  • Must be double-stranded. Single-stranded DNA cannot be converted into SMRTcell templates but can interfere with polymerase binding.

The sample requirements will vary strongly depending on genome size. Please contact us to discuss your project.
We do offer HMW-DNA isolation as a service.   The table shows minimum DNA sample requirements:

PacBio-requirements-012016-1024x167

One SMRT-seq library can provide sufficient material for the sequencing from as few as two to over 20 SMRT cells – depending on the amount and quality of the starting material and the desired size selection cutoffs. For example, for a 20 kb insert library we would ask for 20 ug high quality DNA to be able to use the BluePippin size selection​ (and submitting more DNA would be recommended). Lower sample amounts can be used with less stringent size selection, resulting however in reduced average read lengths. DNA samples are best dissolved in TE buffer at a high concentration (~ 100ng/ul or higher) and shipped on blue ice packs or wet ice.  Bacterial DNA samples are best isolated from cells in logarithmic growth phase.  Generally the sequencing of a single SMRT-cell combined accompanied by BluePippin size selection will be sufficient to assemble bacterial chromosomes as single scaffolds.
As with other single molecule sequencing technologies, the read lengths and the sequencing yields do depend on the nucleic acid sample quality.  Every nick or chemical DNA adduct has the potential to abort a read. For difficult DNA samples, especially plant DNA samples with hard-to-remove contaminants (e.g. some polysaccharides), we recommend to carry out a high-salt/phenol/chloroform cleanup  (please note that this protocol often leads to a loss of 50% of the sample) or a purification with the BorealGenomics Aurora instrument (discontinued but still available, please inquire).  We will QC your sample by Pulsed-Field Gel Electrophoresis before library prep (PFGE; on BioRad CHEF, Pippin Pulse) and spectrometry. A band (not a smear) of 50 kb or longer fragments indicates high integrity DNA samples desired for the generation of long insert size libraries.   Please note that spectrometry and PFGE can not fully asses the quality and suitability of the DNA samples since they asses the DNA as double-stranded molecules.  For example single-strand nicks and chemical adducts  could escape these methods.  The QC data however allow to rule out clearly problematic samples.
The final quality assessment of the DNA sample will however be the single molecule sequencing process itself (e.g the average read lengths). Bacterial DNA samples extracted using silica columns will be sheared by the spin columns to fragments of about 20 kb in size. Such bacterial samples tend to generate high quality data and are acceptable. This method is not recommended for eukaryotic samples.

Before submitting samples:

  • Please email us a picture of an agarose gel of the sample running also a marker with at least a 20 kb upper band (e.g. GeneRuler 1 kb Plus DNA Ladder or Lambda DNA/HindIII Digest Marker; suggested is a also a lane with undigested Lamdba phage DNA [48kb  e.g. NEB N3011S]). ​Please run the electrophoresis slowly (e.g. at 80V depending on setup).
  • Assess sample purity vie spectrophotometry  – the 260/280 ratio should be between 1.8 to 2.0 and the 260/230 ratio should be higher than 2.0. PacBio recommends MoBio PowerClean columns for sample cleanup or the high-salt/phenol/chloroform protocol mentioned above if necessary.
  • Please use fluorometric methods (e.g. Qubit) for DNA quantification if possible. Measure each sample at least 3 times and accept only reproducible measurements (HMW DNA is often not perfectly dissolved).  Spectrophotometry is not reliable for quantifications (especially if the DNA extraction protocol used CTAB).

The DNA samples used for making PacBio libraries must be handled with extreme care – if you need to ship your DNA to our facility, please consult the following PacBio guidelines for shipping and handling. More info and the submission form can be found on our Sample Submission & Scheduling page.   We must receive electronic and print copies of the submission form.

Prices

Costs for PacBio sequencing reflect the number of libraries and number of SMRT cells required. Our recharge rates can be viewed here.  The listed fees include all labor and reagents. BluePippin size selection is optional (and carries an additional fee), but is highly recommended. Please note our reduced high throughput (HT) recharge rates; these apply if 10 or more SMRT-cells are sequenced per sample or if 6 or more libraries are generated.

 

Latest Tweets
  • "..... Single-cell studies revealed that transcription occurs in discontinuous bursts, suggesting that features of… ,
  • I am currently writing "The Art of Stupidity". Hope this is not taken yet. ,
  • BTW, the TIN score is calculated by the RSeQC from the alignments and is designed to be analogous to the Bioanalyze… ,
  • Make use of the transcript integrity number ("TIN" score) to avoid misinterpretations of (conventional) RNA-Seq dat… ,
  • Single-cell RNA-Seq finds brain, muscle cells lurking in kidney organoids grown in lab - ,