Which data will I receive from the PacBio Sequel II sequencer? Will they have quality scores?

We will deliver the complete data set generated by the PacBio Sequel to you securely via Bioshare.

For push-button type secondary analyses (combining data for up to 2 SMRT-cells e.g. for demultiplexing, CCS, long amplicon, or IsoSeq analysis) we can run these on our own server and will also deliver all the resulting data.  Please note that we are not data analysts, and we can not experiment with analysis parameters.  Thus, we highly recommend to work with the Bioinformatics Core for comprehensive data analyses. Alternatively it is recommended that you should install the SMRT Tools command line programs which are part of the SMRT-Link package.

The data files generated by the PacBio Sequel are different from the data generated by the PacBio RSII previously. All the data that were before contained in the bas.h5/bax.h5  and accessory formats are now contained in  the .bam.xml, and  .pbi files.  Please see this page for the detailed format specifications (pacbiofileformats.readthedocs.io/en/5.0/index.html).  Specifically the raw data or each SMRT-cell will be in files named  .subreads.bam.subreads.xml, and  .subreads.pbi .

The .bam data can be converted to fastq or fasta files with bamtools (please see at the bottom of this page: github.com/PacificBiosciences/PacBioFileFormats/wiki/BAM-recipes) or best with the PacBio tool bam2fastx.  “bam2fastx” is part of the free SMRT Tools:  pacb.com/support/software-downloads/ .

Please note that raw data quality scores are the same for all bases of the Sequel raw data (PHRED 0 — ASCII !).  PacBio came to the conclusion that computing the quality scores for the raw data was a waste of time. Apparently the quality scores for the raw data cannot be reliably computed (and consequently these were also ignored for RSII data pipelines).  However, usable PacBio quality scores can be generated from consensus data if the project allows (either by CCS or other secondary analysis algorithms: e.g. by alignments all-vs-all).  In short the determination of the quality of individual reads is up the downstream analysis pipeline (e.g. the assembler).

Category: 06 Sequencing Data

← FAQ
Posted in