NovaSeq 6000 Technology
The HiSeq 4000 sequencer represents the very latest sequencer generation and enables significantly faster sequencing and reduced per-base-costs. Please note that for some samples the library preps need to be adjusted for the HiSeq 4000. The new sequencer is more than three times faster than the HiSeq 2500 and yields about 65% more reads per lane. For example the run-time of a paired-end 100 bp run is only 3 days on the HiSeq 4000 compared to 11 days on the HiSeq 2500. Illumina expects between 260 and 310 million clusters per lane passing filter (official figures for both the HiSeq 3000 and 4000). Our first data indicate averages of more than 370 million clusters passing filter per lane are possible. Please see this post for more details. The maximum HiSeq 4000 read length is 150 bp compared to 100 with the HiSeq 2500 High Output mode. Thus, the maximum yield per lane has more than doubled using the HiSeq 4000 sequencers (2.5x). The HiSeq 2500 is still of great interest since it generates paired-end 250 bp reads (in rapid mode).
The progress in speed and output is enabled by two new technologies: patterned flow cells and kinetic exclusion amplification (please see the video). In contrast to the random clustering employed in previous HiSeqs, the clusters are now generated in ordered nano-wells to allow for higher cluster densities and unambiguous cluster identification.
Libraries For HiSeq 4000 Sequencing
- Libraries generated for Illumina platforms with recent kits from all major suppliers (e.g. Bioo, Kapa, NEB, Nugen, Qiagen) will sequence perfectly fine; this includes Illumina Nextera libraries. Please notice the limits on the insert-size lengths outlined below. The libraries should show a distinct peak in the bioanalyzer traces. The tighter the size range distribution, the better the higher the yields
- Depending on protocol specifics, some GBS and RAD-seq libraries can be sequenced with a very low percentage of spike-ins resulting in high yields of high-quality data.
- Very low complexity libraries (e.g. amplicons) will require a considerable spike-in of PhiX or other balanced libraries; the resulting amplicon data are of high quality.
- Bisulfite sequencing libraries can be sequenced with spike-ins of 40% of PhiX or other balanced libraries; the quality-scores of the reverse read do not reflect the sequencing accuracy in this case.
- A lot of adapter dimers (if present, these should constitute less than of 0.5% of the molecules).
- A considerable percentage of longer fragments (insert sizes over 550 bp, or total lengths over 670 bp). Minor “tails” of longer fragments are still suitable.
- Single-end adapters (these old adapter designs should have been discarded years ago).
- Custom sequencing primers are not supported by Illumina on the HiSeq 4000.
- The new exclusion amplification on the flowcells more strongly advantages short library molecules (especially the even shorter adapter dimers) as compared to the previous bridge amplification protocol. According to Illumina 1% adapter dimer content can result in more than 6% of adapter dimer reads; 10% adapter dimer content resulted in 84% being unusable adapter dimer reads.
- The minimum library concentration has increased to 2 nM. Please submit libraries with concentrations of 5 nM or higher if possible. We would require at least 15 ul of the library samples.
Sequencing Library Size Selection
Ampure XP bead “upper cut” protocol to remove fragments over 670 bases:
-
-
- If not mentioned explicitly follow the standard Ampure XP handling extractions from the manufacturer (e.g. equilibrate the beads at to room temperature before use; vortex beads before use, details of the bead washes and elution,…)
- Add 0.55x the sample volume in Ampure beads to your sample, mix, incubate for 5 minutes at RT.
- Collect the beads on a magnet.
- Transfer the supernatant to a new tube.
- Add another 1x original volume Ampure beads to the supernatant; mix; incubate for 5 minutes
- Collect the beads on a magnet and remove the supernatant
- Carry out the regular 70% ethanol washes of the beads and elute the samples from the beads according to Agencourt Ampure XP protocol.
- Verify the success of the size selection by running an aliquot on a Bioanalyzer or equivalent instrument.
-
HiSeq 4000 Run Metrics & Quality Scores
Please note that several run metrics are calculated differently for the HiSeq 4000. The “Clusters Passing Filter” metric is now calculated as a percentage of the number of wells and thus is seemingly considerably lower; values between 55% and 70% are expected. Each flow cell lane has 482.68 million nanowells. The demultiplexing metrics are now delivered as “laneBarcode.html” files. The “Top Unknown Barcodes” table therein usually begins with more than 100 million barcode reads of “NNNNNN”. This is not a sign of any sequencing problems. This figure reflects merely the unused nanowells and more than 150 million unused nanowells are fully expected.
The read quality scores provided in the fastq files are now binned to reduce the file sizes; the scale (Sanger; Phred+33) stays the same, though.
The quality scores now reach up to the code K (43) which is 1 score higher than was common previously. Some older bioinformatics tools might have problems with this score. In this case it is helpful to convert all “K”s in the quality score line with “J”s. This will not cause any loss of information. In unix/linux/mac environments you could use for example this command on the FASTQ files (thanks: Dylan Storey): sed -i '4~4 s/\([]{}\|~\\\^_`[K-Za-z]\)/J/
HiSeq Sequencing Calendar and Queue
Please see the HiSeq Sequencing Calendar and Queue for the current Hiseq4000 and Hiseq2500 schedules.