Abstract

BackgroundThe emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation.Methodology/Principal FindingsIn this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis.ConclusionThese data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).

Highlights

  • Prior to 2004, most DNA sequencing used the chaintermination method developed by F

  • Genomes and technologies surveyed During the last 7 years, 133 microbial genomes were sequenced to completion at the Department of Energy-Joint Genome Institute (DOE-JGI) (Table S1)

  • Our analyses show that the use of Illumina-based sequencing technologies for microbial genome projects is cost effective but can generate the entire sequence without significant loss of information, to what other studies have shown [15]

Read more

Summary

Introduction

Prior to 2004, most DNA sequencing used the chaintermination method developed by F. A Sanger sequencing machine yields about 1.5 Mbp/day of highquality reads with an average length of 500–800 bases. The fragments of DNA to be sequenced must first be cloned and the resulting libraries maintained. Generation sequencing (NGS) technologies bypass cloning by immobilizing the DNA fragments and subjecting them to sequential interrogations. Used technologies, such as 454 pyrosequencing [2] and Illumina sequencing-by-synthesis [3], use DNA polymerase to drive their sequencing reactions but do not require cloning, Pacific Biosciences use a sequencing by synthesis technology which is applied on single molecule in real time [4]. Read length for the 454 platform exceeds 600 bases; output is 10 Gb per run. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, and of the analysis software employed for assembly and annotation

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call