Abstract

BackgroundWith the rapid development of the next generation sequencing (NGS) technology, large quantities of genome sequencing data have been generated. Because of repetitive regions of genomes and some other factors, assembly of very short reads is still a challenging issue.ResultsA novel strategy for improving genome assembly from very short reads is proposed. It can increase accuracies of assemblies by integrating de novo contigs, and produce comparative contigs by allowing multiple references without limiting to genomes of closely related strains. Comparative contigs are used to scaffold de novo contigs. Using simulated and real datasets, it is shown that our strategy can effectively improve qualities of assemblies of isolated microbial genomes and metagenomes.ConclusionsWith more and more reference genomes available, our strategy will be useful to improve qualities of genome assemblies from very short reads. Some scripts are provided to make our strategy applicable at http://code.google.com/p/cd-hybrid/.

Highlights

  • With the rapid development of the generation sequencing (NGS) technology, large quantities of genome sequencing data have been generated

  • The de novo assembly strategy is to construct genome sequences from a set of sequence reads without the help of reference genomes, either using the overlap-layout-consensus (OLC) approach or an algorithm based on a de Bruijn graph (DBG)

  • Using simulated short read datasets, we show that this method significantly reduce error rates of de novo assemblies and produce extremely reliable DBG contigs

Read more

Summary

Introduction

With the rapid development of the generation sequencing (NGS) technology, large quantities of genome sequencing data have been generated. With the dramatically reduced time and cost for sequencing a genome, thousands of such projects have been finished or are in progress These projects are either de novo sequencing or re-sequencing of Genome assembly from very short reads is challenging because of genomic repeats and it requires intensive computation resources. The de novo assembly strategy is to construct genome sequences from a set of sequence reads without the help of reference genomes, either using the overlap-layout-consensus (OLC) approach or an algorithm based on a de Bruijn graph (DBG). Both methods have been well described in previous reports [11,12]. Because the DBG-based assemblers can more accurately resolve genomic repeats with less computation than OLC-based ones, they have been widely adopted by genome sequencing projects [11]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call