Abstract

High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes. Nevertheless, assembly of complex genomes remains challenging, in part due to the presence of dispersed repeats which introduce ambiguity during genome reconstruction. Transposable elements (TEs) can be particularly problematic, especially for TE families exhibiting high sequence identity, high copy number, or complex genomic arrangements. While TEs strongly affect genome function and evolution, most current de novo assembly approaches cannot resolve long, identical, and abundant families of TEs. Here, we applied a novel Illumina technology called TruSeq synthetic long-reads, which are generated through highly-parallel library preparation and local assembly of short read data and which achieve lengths of 1.5–18.5 Kbp with an extremely low error rate (0.03% per base). To test the utility of this technology, we sequenced and assembled the genome of the model organism Drosophila melanogaster (reference genome strain y; cn, bw, sp) achieving an N50 contig size of 69.7 Kbp and covering 96.9% of the euchromatic chromosome arms of the current reference genome. TruSeq synthetic long-read technology enables placement of individual TE copies in their proper genomic locations as well as accurate reconstruction of TE sequences. We entirely recovered and accurately placed 4,229 (77.8%) of the 5,434 annotated transposable elements with perfect identity to the current reference genome. As TEs are ubiquitous features of genomes of many species, TruSeq synthetic long-reads, and likely other methods that generate long-reads, offer a powerful approach to improve de novo assemblies of whole genomes.

Highlights

  • Tremendous advances in DNA sequencing technology, computing power, and assembly approaches, have enabled the assembly of genomes of thousands of species from the sequences of DNA fragments, but several challenges still remain

  • While future work will be required to investigate the use of the technology for resolving polymorphism in outbred species, our work focuses on the accuracy of assembly of repetitive DNA sequences

  • Using a pipeline of standard existing tools, we demonstrate the ability of TruSeq synthetic long-reads to facilitate de novo assembly and resolve Transposable elements (TEs) sequences in the genome of the fruit fly Drosophila melanogaster, a key model organism in both classical genetics and molecular biology

Read more

Summary

Introduction

Tremendous advances in DNA sequencing technology, computing power, and assembly approaches, have enabled the assembly of genomes of thousands of species from the sequences of DNA fragments, but several challenges still remain. All assembly approaches are based on the assumption that similar sequence reads originate from the same genomic region, thereby allowing the reads to be overlapped and merged to reconstruct the underlying genome sequence [1]. Deviations from this assumption, including those arising due to polymorphism and repeats, complicate assembly and may induce assembly failure. By performing assembly of the Drosophila melanogaster genome, our study uses comparison to a high-quality reference to evaluate the application of synthetic long-read technology for de novo assembly. While future work will be required to investigate the use of the technology for resolving polymorphism in outbred species, our work focuses on the accuracy of assembly of repetitive DNA sequences

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.