Abstract

Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole-genome assembly an appealing target application. In this paper we evaluate the feasibility of de novo genome assembly from short reads (≤100 nucleotides) through a detailed study involving genomic sequences of various lengths and origin, in conjunction with several of the currently popular assembly programs. Our extensive analysis demonstrates that, in addition to sequencing coverage, attributes such as the architecture of the target genome, the identity of the used assembly program, the average read length and the observed sequencing error rates are powerful variables that affect the best achievable assembly of the target sequence in terms of size and correctness.

Highlights

  • Recent advances in massively parallel genome sequencing provide a cost-effective potential alternative to the traditional Sanger method [1]

  • Programs We studied the performance of six popular genome assembly tools that have been designed to handle short sequencing reads (,50 nt)

  • For the O. sativa 4 Mb dataset, we evaluated the impact of small changes in error rate on the quality of the assembly by generating additional sets of reads with error rates E equal to 0.5%, 1.5%, 2% and 2.5%

Read more

Summary

Introduction

Recent advances in massively parallel genome sequencing provide a cost-effective potential alternative to the traditional Sanger method [1]. The number of de novo short read genome assembly tools has been increasing steadily. The graph representation allows for a compact representation and processing of the input whereas its size depends on the genome size and the number of k-mers Representative methods in this category include Euler-SR [7], Velvet [8] and Allpaths-LG [9]. Other schemes are based on a more traditional overlap and contig extension approach and include the Edena method [10], Sharcgs [11] and Vcake [12] These assemblers have been designed to handle small genomes, such as bacteria, and may not be directly applicable on larger more complex genomes. In addition to short read assemblers, there are specialized tools for assembling longer pyrosequencing reads (i.e., from the 454 technology), such as CABOG [17]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.