Abstract
About 8% of the human genome is made up of endogenous retroviruses (ERVs). Though most human endogenous retroviruses (HERVs) are thought to be irrelevant to our biology notable exceptions include members of the HERV-H family that are necessary for the correct functioning of stem cells. ERVs are commonly found in two forms, the full-length proviral form, and the more numerous solo-LTR form, thought to result from homologous recombination events. Here we introduce a phylogenetic framework to study ERV insertion and solo-LTR formation. We then apply the framework to site patterns sampled from a set of long alignments covering six primate genomes. Studying six categories of ERVs we quantitatively recapitulate patterns of insertional activity that are usually described in qualitative terms in the literature. A slowdown in most ERV groups is observed but we suggest that HERV-K activity may have increased in humans since they diverged from chimpanzees. We find that the rate of solo-LTR formation decreases rapidly as a function of ERV age and that an age dependent model of solo-LTR formation describes the history of ERVs more accurately than the commonly used exponential decay model. We also demonstrate that HERV-H loci are markedly less likely to form solo-LTRs than ERVs from other families. We conclude that the slower dynamics of HERV-H suggest a host role for the internal regions of these exapted elements and posit that in future it will be possible to use the relationship between full-length proviruses and solo-LTRs to help identify large scale co-options in distant vertebrate genomes.
Highlights
By definition, endogenous retroviruses (ERVs) are the result of the Mendelian transmission of retroviruses from parent to progeny
A slowdown in most ERV groups is observed but we suggest that human endogenous retroviruses (HERVs)-K activity may have increased in humans since they diverged from chimpanzees
We investigated the properties of insertions from the four largest families in our sample: ERV9 (245 insertions); HERV-K11 (197 insertions); HERV-H (116 insertions); and HERV-K (59 insertions)
Summary
Endogenous retroviruses (ERVs) are the result of the Mendelian (vertical germ line) transmission of retroviruses from parent to progeny. Over many generations it is possible for an ERV to fix in a host population so that in humans, for example, as much as 8% of the genome is thought to be retrovirally derived [1]. Successful retroviral insertions (proviruses) are known to initially possess a common structure consisting of viral genes flanked by a pair of identical sequences known as long terminal repeats (LTRs). ERVs that retain this characteristic viral structure are commonly described as full-length ERVs. In addition to full-length ERVs, endogenized viruses are found in a second, dramatically different form, referred to as a solo-LTR. Solo-LTRs are thought to be generated when paired LTRs undergo non-allelic homologous recombination which results in a deletion and an associated acentric fragment [5], a piece of chromosomal material lacking a centromere that is unlikely to persist across many cell divisions. Like other genomic DNA, both forms of ERVs are subject to ordinary mutational processes so that over time they may become degraded or fragmented due to point mutations or indel events
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have