Abstract

Next-generation sequencing technologies enable the rapid cost-effective production of sequence data. To evaluate the performance of these sequencing technologies, investigation of the quality of sequence reads obtained from these methods is important. In this study, we analyzed the quality of sequence reads and SNP detection performance using three commercially available next-generation sequencers, i.e., Roche Genome Sequencer FLX System (FLX), Illumina Genome Analyzer (GA), and Applied Biosystems SOLiD system (SOLiD). A common genomic DNA sample obtained from Escherichia coli strain DH1 was applied to these sequencers. The obtained sequence reads were aligned to the complete genome sequence of E. coli DH1, to evaluate the accuracy and sequence bias of these sequence methods. We found that the fraction of “junk” data, which could not be aligned to the reference genome, was largest in the data set of SOLiD, in which about half of reads could not be aligned. Among data sets after alignment to the reference, sequence accuracy was poorest in GA data sets, suggesting relatively low fidelity of the elongation reaction in the GA method. Furthermore, by aligning the sequence reads to the E. coli strain W3110, we screened sequence differences between two E. coli strains using data sets of three different next-generation platforms. The results revealed that the detected sequence differences were similar among these three methods, while the sequence coverage required for the detection was significantly small in the FLX data set. These results provided valuable information on the quality of short sequence reads and the performance of SNP detection in three next-generation sequencing platforms.

Highlights

  • Three next-generation sequencing (NGS) technologies—Roche Genome Sequencer FLX System (FLX), Illumina Genome Analyzer (GA), and Applied Biosystems SOLiD system (SOLiD)—enable the rapid and cost-effective production of high-quality genome sequence data

  • We evaluated the statistical nature of sequence reads and single nucleotide polymorphisms (SNPs) detection performance using three commercially available NGS platforms, i.e., FLX, GA, and SOLiD

  • The performance of SNP detection was similar among the three NGS platforms, while the coverage required for the SNP detection was significantly small in the data set of FLX, as expected from its relatively long sequence and high accuracy of FLX reads

Read more

Summary

Introduction

Three next-generation sequencing (NGS) technologies—Roche Genome Sequencer FLX System (FLX), Illumina Genome Analyzer (GA), and Applied Biosystems SOLiD system (SOLiD)—enable the rapid and cost-effective production of high-quality genome sequence data. For the analysis of NGS data, it is necessary to assemble these millions of short sequence data to extract sequence features of DNA samples, such as detection of single nucleotide polymorphisms (SNPs) and de novo sequencing [3] For such analysis, the total amounts of data, and the quality of sequencing reads, such as error rate and systematic sequence bias in the obtained short reads, markedly impact the assembly results [4]. The performance of SNP detection was similar among the three NGS platforms, while the coverage required for the SNP detection was significantly small in the data set of FLX, as expected from its relatively long sequence and high accuracy of FLX reads These analyses provided valuable information on the quality of short sequence reads and the performance of SNP detection

Results and Discussion
Method FLX GA SOLiD
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call