Abstract

BackgroundSequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations.ResultsWe benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species.ConclusionsThe benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.

Highlights

  • Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes

  • Based on our benchmarking results, we propose a comprehensive pipeline for the generation of de novo transposable elements (TEs) libraries that can be used for genome annotation

  • Development of a species-specific TE library is an essential step in the annotation process, which begins with structural identification of major TE classes and can be followed by manual curation

Read more

Summary

Introduction

Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. While the low-copy, genic fraction of genomes has assembled well, even with short-read sequencing technology, assemblies of TEs and other repeats have remained incomplete and highly fragmented until quite recently. Long-read sequencing (e.g., PacBio and Oxford Nanopore) and assembly scaffolding (e.g., Hi-C and BioNano) techniques have progressed rapidly within the last few years These innovations have been critical for highquality assembly of the repetitive fraction of genomes. Ou et al [8] demonstrated that the assembly contiguity of repetitive sequences in recent long-read assemblies is even better than traditional BAC-based reference genomes. With these developments, inexpensive and high-quality assembly of an entire genome is possible. Unlike the relatively straightforward and comprehensive pipelines established for gene annotation [9,10,11], current methods for TE annotation can be piecemeal, can be inaccurate, and are highly specific to classes of transposable elements

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.