Human endogenous retroviruses (HERV) represent nearly 8% of the human genome. Of these, HERV-K subtype HML-2 is a transposable element that plays a critical role in embryonic development and in the pathogenesis of several diseases. Quantification and characterization of these multiple HML-2 insertions in the human chromosome has been challenging due to their size, sequence homology with each other, and their repetitive nature. We examined a cohort of 222 individuals for HML-2 proviruses 6q14.1 and 7p22.1a, two loci that are capable of producing full-length viral proteins and have been previously implicated in several cancers, autoimmune disorders and neurodegenerative diseases, using long-read DNA sequencing. While the reference genome for both regions suggests these two loci are structurally dissimilar, we found that for both loci about 5% of individuals have a unique tandem repeat-like sequence (three long terminal repeat sequences sandwiching two internal, potentially protein coding sequences), while most individuals have a standard proviral structure (one internal region sandwiched by two long terminal repeats). Moreover, both proviruses can make full-length, or nearly full-length, HERV-K proteins in multiple transcription orientations. The amino acid sequences from different loci in the same transcriptional orientation share sequence homology with each other. These results demonstrate a clear, previously unreported, relationship between HML-2 loci 6q14.1 and 7p22.1a and highlight the utility of long-read sequencing to study repetitive elements. Future studies need to determine if these polymorphisms determine genetic susceptibility to diseases that are associated with them.
Read full abstract