Abstract

Human Endogenous Retrovirus type K (HERV-K) is the only HERV known to be insertionally polymorphic; not all individuals have a retrovirus at a specific genomic location. It is possible that HERV-Ks contribute to human disease because people differ in both number and genomic location of these retroviruses. Indeed viral transcripts, proteins, and antibody against HERV-K are detected in cancers, auto-immune, and neurodegenerative diseases. However, attempts to link a polymorphic HERV-K with any disease have been frustrated in part because population prevalence of HERV-K provirus at each polymorphic site is lacking and it is challenging to identify closely related elements such as HERV-K from short read sequence data. We present an integrated and computationally robust approach that uses whole genome short read data to determine the occupation status at all sites reported to contain a HERV-K provirus. Our method estimates the proportion of fixed length genomic sequence (k-mers) from whole genome sequence data matching a reference set of k-mers unique to each HERV-K locus and applies mixture model-based clustering of these values to account for low depth sequence data. Our analysis of 1000 Genomes Project Data (KGP) reveals numerous differences among the five KGP super-populations in the prevalence of individual and co-occurring HERV-K proviruses; we provide a visualization tool to easily depict the proportion of the KGP populations with any combination of polymorphic HERV-K provirus. Further, because HERV-K is insertionally polymorphic, the genome burden of known polymorphic HERV-K is variable in humans; this burden is lowest in East Asian (EAS) individuals. Our study identifies population-specific sequence variation for HERV-K proviruses at several loci. We expect these resources will advance research on HERV-K contributions to human diseases.

Highlights

  • Endogenous retroviruses (ERVs) are derived from infectious retroviruses that integrated into a host germ cell at some time in the evolutionary history of a species [1,2,3,4,5]

  • Human Endogenous Retrovirus type K (HERV-K) is the youngest of retrovirus families in the human genome and is the only group of endogenous retroviruses that has polymorphic members; a locus containing a HERV-K can be occupied in one individual but empty in others

  • We develop an easy to use method that reveals the considerable variation existing among global populations in the prevalence of individual and co-occurring polymorphic HERVK, and in the number of HERV-K that any individual has in their genome

Read more

Summary

Introduction

Endogenous retroviruses (ERVs) are derived from infectious retroviruses that integrated into a host germ cell at some time in the evolutionary history of a species [1,2,3,4,5]. ERVs in humans (HERVs) comprise up to 8% of the genome and have contributed important functions to their host [6,7,8]. The infection events that resulted in the contemporary profile of HERVs occurred prior to emergence of modern humans so most HERVs are fixed in human populations and those of closely related primates. Some HERVs are still transcriptionally active and capable of causing new germline insertions so that individuals differ in the number and genomic location occupied by an ERV, a situation termed insertional polymorphism [9,10,11]. Among all families of HERVs, HERV-K is the only one known to be insertionally polymorphic in humans. HERV-K genomes are closely related and as with many repetitive elements, they are difficult to accurately assign to a genomic location using standard mapping approaches [12,13]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call