Abstract

Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. It has also been widely used to study structural variants, phase haplotypes and more. Here, we introduce the assembler SMARTdenovo, a single-molecule sequencing (SMS) assembler that follows the overlap-layout-consensus (OLC) paradigm. SMARTdenovo (RRID: SCR_017622) was designed to be a rapid assembler, which, unlike contemporaneous SMS assemblers, does not require highly accurate raw reads for error correction. It has performed well in the evaluation of congeneric assemblers and has been successfully users for various assembly projects. It is compatible with Canu for assembling high-quality genomes, and several of the assembly strategies in this program have been incorporated into subsequent popular assemblers. The assembler has been in use since 2015; here we provide information on the development of SMARTdenovo and how to implement its algorithms into current projects.

Highlights

  • The development of high-throughput sequencing provides the means to deliver fast, inexpensive, and accurate information for assembling whole genomes

  • Assembling the genome of the fruit fly and evaluating accuracy of the assembly We benchmarked SMARTdenovo against the Single-molecule sequencing (SMS) assemblers Flye, Canu, and Ra using the dataset of the fruit fly, Drosophila melanogaster, and calculated the accuracy of the assembly by aligning it with the reference genome

  • A total of 29.3 Gb Pacific Biosciences (PacBio) reads from the National Center for Biotechnology Information Sequence Read Archive database (SRX499318) [25] was assembled with the command line “smartdenovo.pl -c 1 -t 16 reads.fa > wtasm.mak && make -f wtasm.mak”

Read more

Summary

Introduction

The development of high-throughput sequencing provides the means to deliver fast, inexpensive, and accurate information for assembling whole genomes. Single-molecule sequencing (SMS) technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore, which generate sequencing reads of >20 kb in length, are widely used in whole genome projects. SMARTdenovo (RRID: SCR_017622) is a long-read SMS assembler that follows the overlap-layout-consensus (OLC) paradigm.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.