Abstract

BackgroundThe sequencing depth provided by high-throughput sequencing technologies has allowed a rise in the number of de novo sequenced genomes that could potentially be closed without further sequencing. However, genome scaffolding and closure require costly human supervision that often results in genomes being published as drafts. A number of automatic scaffolders were recently released, which improved the global quality of genomes published in the last few years. Yet, none of them reach the efficiency of manual scaffolding.ResultsHere, we present an innovative semi-automatic scaffolder that additionally helps with chimerae resolution and generates valuable contig maps and outputs for manual improvement of the automatic scaffolding. This software was tested on the newly sequenced marine cyanobacterium Synechococcus sp. WH8103 as well as two reference datasets used in previous studies, Rhodobacter sphaeroides and Homo sapiens chromosome 14 (http://gage.cbcb.umd.edu/). The quality of resulting scaffolds was compared to that of three other stand-alone scaffolders: SSPACE, SOPRA and SCARPA. For all three model organisms, WiseScaffolder produced better results than other scaffolders in terms of contiguity statistics (number of genome fragments, N50, LG50, etc.) and, in the case of WH8103, the reliability of the scaffolds was confirmed by whole genome alignment against a closely related reference genome. We also propose an efficient computer-assisted strategy for manual improvement of the scaffolding, using outputs generated by WiseScaffolder, as well as for genome finishing that in our hands led to the circularization of the WH8103 genome.ConclusionAltogether, WiseScaffolder proved more efficient than three other scaffolders for both prokaryotic and eukaryotic genomes and is thus likely applicable to most genome projects. The scaffolding pipeline described here should be of particular interest to biologists wishing to take advantage of the high added value of complete genomes.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0705-y) contains supplementary material, which is available to authorized users.

Highlights

  • The sequencing depth provided by high-throughput sequencing technologies has allowed a rise in the number of de novo sequenced genomes that could potentially be closed without further sequencing

  • Comparison of WiseScaffolder with other scaffolders The recent advent of Next Generation Sequencing (NGS) technologies has triggered the development of a number of stand-alone scaffolders, the relative efficiency of which may be tricky to assess for bioanalysts and is highly dataset dependent

  • WH8103 genome (42 contigs assembled using the CLC assembler; MP library insert size ~4 kb) as well as two reference datasets retrieved from the GAGE study [14], namely R. sphaeroides (177 Bambus2-assembled contigs [16]; MP library insert size ~3.5 kb) and Homo sapiens Chr.14 (3,541 CABOG-assembled contigs [17]; MP library insert size: 2.3-2.8 kb)

Read more

Summary

Introduction

The sequencing depth provided by high-throughput sequencing technologies has allowed a rise in the number of de novo sequenced genomes that could potentially be closed without further sequencing. While complete genomes were a gold standard until the end of the 2000’s, Farrant et al BMC Bioinformatics (2015) 16:281 and/or adaptation to specific ecological niches. In this context, bioinformaticians constantly have to adapt algorithms and pipelines to the rapidly evolving sequencing technologies, which generate genomic data with everincreasing read length and sequencing depth, in order to successfully address issues raised by genome assembly and scaffolding. The use of paired reads generally improves the quality of the assembly, the simultaneous use of the sequences and insert sizes information shows some limitations for both library types. While MPs allow to do so, generation of such longinsert libraries remains technically challenging, often resulting in large insert size variability [5,6,7], leading to an approximate number of Ns within contigs

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.