Abstract
BackgroundPlant LTR-retrotransposons are classified into two superfamilies, Ty1/copia and Ty3/gypsy. They are further divided into an enormous number of families which are, due to the high diversity of their nucleotide sequences, usually specific to a single or a group of closely related species. Previous attempts to group these families into broader categories reflecting their phylogenetic relationships were limited either to analyzing a narrow range of plant species or to analyzing a small numbers of elements. Furthermore, there is no reference database that allows for similarity based classification of LTR-retrotransposons.ResultsWe have assembled a database of retrotransposon encoded polyprotein domains sequences extracted from 5410 Ty1/copia elements and 8453 Ty3/gypsy elements sampled from 80 species representing major groups of green plants (Viridiplantae). Phylogenetic analysis of the three most conserved polyprotein domains (RT, RH and INT) led to dividing Ty1/copia and Ty3/gypsy retrotransposons into 16 and 14 lineages respectively. We also characterized various features of LTR-retrotransposon sequences including additional polyprotein domains, extra open reading frames and primer binding sites, and found that the occurrence and/or type of these features correlates with phylogenies inferred from the three protein domains.ConclusionsWe have established an improved classification system applicable to LTR-retrotransposons from a wide range of plant species. This system reflects phylogenetic relationships as well as distinct sequence and structural features of the elements. A comprehensive database of retrotransposon protein domains (REXdb) that reflects this classification provides a reference for efficient and unified annotation of LTR-retrotransposons in plant genomes. Access to REXdb related tools is implemented in the RepeatExplorer web server (https://repeatexplorer-elixir.cerit-sc.cz/) or using a standalone version of REXdb that can be downloaded seaparately from RepeatExplorer web page (http://repeatexplorer.org/).
Highlights
Plant Long terminal repeats (LTR)-retrotransposons are classified into two superfamilies, Ty1/copia and Ty3/gypsy
Since the 5′ LTR and 3′ LTR are identical at the time of insertion of a new element copy to the genome the level of their divergence which is caused by mutations acquired over time is proportional to the insertion age
In order to be able to compare our data with sequences of previously described elements, additional LTR-retrotransposon nucleotide sequences were added from public databases [39,40,41] and from published studies [11, 24, 33]
Summary
Plant LTR-retrotransposons are classified into two superfamilies, Ty1/copia and Ty3/gypsy They are further divided into an enormous number of families which are, due to the high diversity of their nucleotide sequences, usually specific to a single or a group of closely related species. Long terminal repeats (LTR) retrotransposons are a very large and diverse group of transposable elements that are ubiquitous in eukaryotes They are abundant in plant genomes, making up to 75% of nuclear DNA [1]. LTRretrotransposons are often viewed as genomic parasites they may be beneficial to their hosts by providing regulatory genetic elements [7], driving rapid genomic changes [8, 9] or being an integral part of specific genome regions such as centromeres [10, 11] Investigation of these processes is crucial to understanding genome evolution and function. These efforts are complicated by the absence of a general and applicable system of classification for these highly diverse elements
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have