Abstract

Retrotransposons comprise a substantial fraction of eukaryotic genomes, reaching the highest proportions in plants. Therefore, identification and annotation of retrotransposons is an important task in studying the regulation and evolution of plant genomes. The majority of computational tools for mining transposable elements (TEs) are designed for subsequent genome repeat masking, often leaving aside the element lineage classification and its protein domain composition. Additionally, studies focused on the diversity and evolution of a particular group of retrotransposons often require substantial customization efforts from researchers to adapt existing software to their needs. Here, we developed a computational pipeline to mine sequences of protein-coding retrotransposons based on the sequences of their conserved protein domains—DARTS (Domain-Associated Retrotransposon Search). Using the most abundant group of TEs in plants—long terminal repeat (LTR) retrotransposons (LTR-RTs)—we show that DARTS has radically higher sensitivity for LTR-RT identification compared to the widely accepted tool LTRharvest. DARTS can be easily customized for specific user needs. As a result, DARTS returns a set of structurally annotated nucleotide and amino acid sequences which can be readily used in subsequent comparative and phylogenetic analyses. DARTS may facilitate researchers interested in the discovery and detailed analysis of the diversity and evolution of retrotransposons, LTR-RTs, and other protein-coding TEs.

Highlights

  • Transposable elements (TEs) are important players in the evolution of genomes [1,2,3,4].The activity of TEs drives genetic diversity, contributes to the establishment of new gene regulatory networks and the rewiring of the existing ones, and can result in the origin of new genes sequestered by the host genome for its functioning [5,6,7]

  • Later, when doing an independent search using tBLASTn with an additional ribonuclease H domain (aRNH) sequence as a query, we found that a substantial fraction of aRNH-containing Tat long terminal repeat (LTR)-reverse transcriptase (RT) were underrepresented in the LTRharvest output

  • We named it Domain-Associated Retrotransposon Search (DARTS), as the initiation of the screen and subsequent structural annotation are based on the prediction of conserved protein domains and not LTR sequences

Read more

Summary

Introduction

Transposable elements (TEs) are important players in the evolution of genomes [1,2,3,4].The activity of TEs drives genetic diversity, contributes to the establishment of new gene regulatory networks and the rewiring of the existing ones, and can result in the origin of new genes sequestered by the host genome for its functioning [5,6,7]. The long-term existence and evolution of TEs has resulted in a broad diversity of the mechanisms for their transposition and replication, and the origin of a variety of different structural variants [8,9]. Retrotransposons, a group of TEs that move through a reverse transcription mechanism, are the most ubiquitous TEs in eukaryotic genomes. Due to their propensity to increase in copy number, retrotransposons constitute a substantial portion of the host genome, reaching as high as 80% of the total genome size in some plants [10,11]. Despite similarities in the general replication mechanism, LTR-RTs are structurally diverse and encode for additional protein domains, which are supposed to fine-tune their life cycle [18,19,20]. The history of a distinct protein domain in a retrotransposon can be different from the evolution of its core RT domain [21,23,24]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call