Abstract

BackgroundThe short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Third-generation, single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem.ResultsTo measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These single-library assemblies are also more accurate than typical short-read assemblies and hybrid assemblies of short and long reads.ConclusionsAutomated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of completed genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.

Highlights

  • The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes

  • Pair libraries were the only sequencing method capable of spanning large repeats, such as the rDNA operon, but the PacBio RS is capable of producing single-molecule reads of the same length. Leveraging this recent development, we present an approach for microbial genome closure that relies on overlapping and assembling single-molecule reads de novo rather than patching and resolving a short-read de Brujin graph

  • Longer sequences can be aligned more accurately, provided they are long enough to compensate for the error rate [29]. These results demonstrate that high-quality, high-continuity bacterial assemblies can be generated using exclusively single-molecule sequencing data

Read more

Summary

Introduction

The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. As the cost of sequencing has dropped, the number of sequencing projects available in the GOLD database [1] has increased 4-fold from 2,905 in 2007 to 11,472 in 2011 [2]. Many available genomes are heavily fragmented into hundreds or thousands of contigs, and many more are sequenced at low coverage and never submitted. This is in stark contrast to the era before the ‘next-generation’ revolution, when many genomes underwent expensive manual gap-closing and sequence verification (finishing) before submission [3]. An even fewer number were ‘finished’ by manually correcting errors and adding annotation [5]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call