The genomes of mammals contain fingerprints of past infections by ancient retroviruses that invaded the germline of their ancestors. Most of these endogenous retroviruses (ERVs) contain only remnants of the original retrovirus; however, on rare occasions, ERV genes can be co-opted for a beneficial host function. While most studies of co-opted ERVs have focused on envelope genes, including the syncytins that function in placentation, there are examples of co-opted gag genes including one we recently discovered in simian primates. Here, we searched for other intact gag genes in non-primate mammalian lineages. We began by examining the genomes of extant camel species, which represent a basal lineage in the order Artiodactyla. This identified a gagpol gene with a large open reading frame (ORF) (>3,500 bp) in the same orthologous location in Artiodactyla species but that is absent in other mammals. Thus, this ERV was fixed in the common ancestor of all Artiodactyla at least 64 million years ago. The amino acid sequence of this gene, termed ARTgagpol, contains recognizable matrix, capsid, nucleocapsid, and reverse transcriptase domains in ruminants, with an RNase H domain in camels and pigs. Phylogenetic analysis and structural prediction of its reverse transcriptase and RNase H domains groups ARTgagpol with gammaretroviruses. Transcriptomic analysis shows ARTgagpol expression in multiple tissues suggestive of a co-opted host function. These findings identify the oldest and largest ERV-derived gagpol gene with an intact ORF in mammals, an intriguing milestone in the co-evolution of mammals and retroviruses. IMPORTANCE Retroviruses are unique among viruses that infect animals as they integrate their reverse-transcribed double-stranded DNA into host chromosomes. When this happens in a germline cell, such as sperm, egg, or their precursors, the integrated retroviral copies can be passed on to the next generation as endogenous retroviruses (ERVs). On rare occasions, the genes of these ERVs can be domesticated by the host. In this study we used computational similarity searches to identify an ancient ERV with an intact viral gagpol gene in the genomes of camels that is also found in the same genomic location in other even-toed ungulates suggesting that it is at least 64 million years old. Broad tissue expression and predicted preservation of the reverse transcriptase fold of this protein suggest that it may be domesticated for a host function. This is the oldest known intact gagpol gene of an ancient retrovirus in mammals.
Read full abstract