HiSeq 4000 Sequencing

HiSeq3000/HiSeq4000 Technology

The HiSeq 3000/4000 sequencers represent the very latest sequencer generation and enables significantly faster sequencing and reduced per-base-costs. Please note that for some samples the library preps need to be adjusted for the HiSeq 3000 and HiSeq 4000. The new sequencer is more than three times faster than the HiSeq 2500 and yields about 65% more reads per lane. For example the run-time of a paired-end 100 bp run is only 3 days on the HiSeq 3000 compared to 11 days on the HiSeq 2500. Illumina expects between 260 and 310 million clusters per lane passing filter (official figures for both the HiSeq 3000 and 4000). Our first data indicate averages of more than 370 million clusters passing filter per lane are possible.  Please see this post for more details, quality scores, and for a link to example data.  The maximum HiSeq 3000 read length is 150 bp compared to 100 with the HiSeq 2500 High Output mode. Thus, the maximum yield per lane has more than doubled using the HiSeq 4000 sequencers (2.5x).  The HiSeq 2500 is still of great interest since it generates paired-end 250 bp reads (in rapid mode).

The progress in speed and output is enabled by two new technologies: patterned flow cells and kinetic exclusion amplification (please see the video). In contrast to the random clustering employed in previous HiSeqs, the clusters are now generated in ordered nano-wells to allow for higher cluster densities and unambiguous cluster identification.

Libraries For HiSeq4000 Sequencing

The majority of existing Illumina sequencing libraries can be sequenced as is on the HiSeq 3000, HiSeq4000  (the libraries should not have any or no visible adapter dimers and the library fragments should be mostly shorter than 670 bases).  Other sequencing libraries can be made compatible by size-selection (removing both adapter-dimer traces and fragments of more than 670 bases, if the latter are numerous).   Any library with Truseq-style adapters (e.g,  Nugen, Bioo, Nimblegen, Agilent adapters) will work just fine as long as they do not fall under the exclusion criteria below.
Illumina does not provide a lot of information on the suitability criteria of libraries for the new sequencers other than recommending only a limited selection of their own library prep kits (TruSeq Nano DNA (350 bp inserts), TruSeq DNA PCR Free (350 bp inserts), TruSeq Stranded mRNA, TruSeq Stranded Total RNA, TruSeq RNA Access, Nextera Rapid Capture Exome).
In our experience almost all sequencing libraries generated using recent protocols are perfectly suitable for the new chemistry.
This includes:
  • Libraries generated for Illumina platforms with recent kits from all major suppliers (e.g. Bioo, Kapa, NEB, Nugen, Qiagen) will sequence perfectly fine; this includes Illumina Nextera libraries.  Please notice the limits on the insert-size lengths outlined below.  The libraries should show a distinct peak in the bioanalyzer traces.  The tighter the size range distribution, the better the higher the yields
  • Depending on protocol specifics, some GBS and RAD-seq libraries can be sequenced with a very low percentage of spike-ins resulting in high yields of high-quality data.
  • Very low complexity libraries (e.g. amplicons) will require a considerable spike-in of PhiX or other balanced libraries; the resulting amplicon data are of high quality.
  • Bisulfite sequencing libraries can be sequenced with spike-ins of 40% of PhiX or other balanced libraries; the quality-scores of the reverse read do not reflect the sequencing accuracy in this case.
 Not suitable are libraries with:
  • A lot of adapter dimers (if present, these should constitute less than of 0.5% of the molecules).
  • A considerable percentage of longer fragments (insert sizes over 550 bp, or total lengths over 670 bp). Minor “tails” of longer fragments are still suitable.
  • Single-end adapters (these old adapter designs should have been discarded years ago).
  • Custom sequencing primers are not supported by Illumina on the HiSeq3000/4000.
  Other important considerations when loading the libraries likely will be:
  • The new exclusion amplification on the flowcells more strongly advantages short library molecules (especially the even shorter adapter dimers) as compared to the previous bridge amplification protocol. According to Illumina 1% adapter dimer content can result in more than 6% of adapter dimer reads;  10% adapter dimer content  resulted in 84% being unusable adapter dimer reads.
  • The minimum library concentration has increased to 2 nM. Please submit libraries with concentrations of 5 nM or higher if possible.  We would require at least 15 ul of the library samples.

Sequencing Library Size Selection

Most libraries will not require a special size selection.  At the moment we suggest to remove library fragments longer than 670 bases libraries for Hiseq3000/4000 sequencing if there are considerable amounts of these.  The size selection does not need to be “perfect”.
The size selection can be achieved by carrying out a simple “upper cut” with Ampure beads (please see the protocol below), by doing a BluePippin size selection, or by manual gel extraction.   We offer the first two of these options.
If your libraries are homogeneously sized, you could pool the libraries and carry out the size selection only once with the pool.  The BluePippin size selection is the most precise and reproducible option.
Ampure XP bead “upper cut” protocol to remove fragments over 670 bases:
Please note: It is recommended to verify this protocol first with your batch of Ampure beads.   This selection protocol will also remove adapter dimers, if they are not dominating the library.  Bead-based size selection cannot carry out precise “cuts”;  thus, you will also loose some of the library in the size ranges that you intend to keep.
      1. If not mentioned explicitly follow the standard Ampure XP handling extractions from the manufacturer (e.g. equilibrate the beads at to room temperature before use; vortex beads before use, details of the bead washes and elution,…)
      2.  Add 0.55x the sample volume in Ampure beads to your sample, mix, incubate for 5 minutes at RT.
      3. Collect the beads on a magnet.
      4. Transfer the supernatant to a new tube.
      5. Add another 1x original volume Ampure beads to the supernatant; mix; incubate for 5 minutes
      6. Collect the beads on a magnet
      7. Carry out the regular 70% ethanol washes and the sample elution from the beads according to Agencourt Ampure XPprotocol.
      8. Verify the success of the size selection by running an aliquot on a Bioanalyzer or equivalent instrument.
Please let us know if we can help with any questions.

HiSeq4000 Run Metrics & Quality Scores

Please note that several run metrics are calculated differently for the HiSeq 3000/4000. The “Clusters Passing Filter” metric is now calculated as a  percentage of the number of wells and thus is seemingly considerably lower; values between 55% and 70% are expected.  Each flow cell lane has 482.68 million nanowells.  The demultiplexing metrics are now delivered as “laneBarcode.html” files. The “Top Unknown Barcodes” table therein usually begins with more than 100 million barcode reads of “NNNNNN”. This is not a sign of any sequencing problems. This figure reflects merely the unused nanowells and more than 150 million unused nanowells are fully expected.

The read quality scores provided in the fastq files are now binned to reduce the file sizes; the scale (Sanger; Phred+33) stays the same, though.
The quality scores now reach up to the code K (43) which is 1 score higher than was common previously.  Some older bioinformatics tools might have problems with this score.  In this case it is helpful to convert all “K”s in the quality score line with “J”s.  This will not cause any loss of information.  In unix/linux/mac environments you could use for example this command on the FASTQ files (thanks: Dylan Storey):   sed -i '4~4 s/\([]{}\|~\\\^_`[K-Za-z]\)/J/g' file.fq

HiSeq Sequencing Calendar and Queue

Please see the HiSeq Sequencing Calendar and Queue for the current Hiseq4000 and Hiseq2500 schedules.