Abstract
Most common methods for inferring transposable element (TE) evolutionary relationships are based on dividing TEs into subfamilies using shared diagnostic nucleotides. Although originally justified based on the “master gene” model of TE evolution, computational and experimental work indicates that many of the subfamilies generated by these methods contain multiple source elements. This implies that subfamily-based methods give an incomplete picture of TE relationships. Studies on selection, functional exaptation, and predictions of horizontal transfer may all be affected. Here, we develop a Bayesian method for inferring TE ancestry that gives the probability that each sequence was replicative, its frequency of replication, and the probability that each extant TE sequence came from each possible ancestral sequence. Applying our method to 986 members of the newly-discovered LAVA family of TEs, we show that there were far more source elements in the history of LAVA expansion than subfamilies identified using the CoSeg subfamily-classification program. We also identify multiple replicative elements in the AluSc subfamily in humans. Our results strongly indicate that a reassessment of subfamily structures is necessary to obtain accurate estimates of mutation processes, phylogenetic relationships and historical times of activity.
Highlights
Repetitive elements may comprise two-thirds or more of most vertebrate genomes [1], and most repeat sequence is derived from transposable elements (TEs)
Previous methods for reconstructing TE evolutionary history were not designed to solve the problem of determining the ancestral source sequence for large numbers of elements
We applied our method to the gibbon-derived LAVA TE family and to the human AluSc subfamily and inferred many more source elements than indicated by previous methods
Summary
Repetitive elements may comprise two-thirds or more of most vertebrate genomes [1], and most repeat sequence is derived from transposable elements (TEs). To obtain an accurate picture of the structure and evolutionary history of vertebrate genomes, it is necessary to have a good understanding of the origins and expansion histories of TEs. Early studies attempted to reconstruct the relationships among TEs by dividing extant TE sequences into subfamilies on the basis of shared high-frequency diagnostic nucleotide variants [2,3,4,5,6,7]. Studies attempted to reconstruct the relationships among TEs by dividing extant TE sequences into subfamilies on the basis of shared high-frequency diagnostic nucleotide variants [2,3,4,5,6,7] Many of these early studies, in primates, were interpreted as supporting a ‘‘master gene model’’, in which one or a few source elements produce large numbers of inert copy elements that are incapable of replication [8,9]. No previously established method can accurately reconstruct relationships among thousands of TE sequences
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have