My FASTQ file contains some “N”s. Is there a problem with my data?

Please note that when opening an Illumina sequence fastq file it is expected that the first few thousand reads are of comparatively low quality and frequently contain “N”s.  An “N” means that the Illumina software was not able to make a basecall for this base. The reads at the beginning and end of the sequence data files originate from the edges of the flowcells, where imaging is more difficult, thus these reads show below average quality. In many cases it is best to ignore the first and last 100,000 reads of an Illumina dataset, since they are not at all representative.
To verify the data quality of the entire dataset we recommend running a program like FASTQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).  FASTQC will summarize data quality and run multiple other analyses on the data. Please note that these additional analyses presuppose that the data were generated by whole genome shotgun sequencing – thus they will usually result in warnings that are not applicable to other data types.

Category: 06 Sequencing Data

← FAQ
Posted in
Latest Tweets
  • Ipipet is/was (?) a very handy tool for aiding manual pipetting in and out of 96-well plates - using tablets as a g… ,
  • The IntelliKin works with array tapes - just like our Intelliqubes. qPCR for 4x 768 samples in a smaller form fact… ,
  • PCR-like DNA amplification with an enzyme cocktail that can be heat-killed. Looks very promising. SHARP amplificati… ,