Abstract

Acquisition of genetic material from viruses by their hosts can generate inter-host structural genome variation. We developed computational tools enabling us to study virus-derived structural variants (SVs) in population-scale whole genome sequencing (WGS) datasets and applied them to 3,332 humans. Although SVs had already been cataloged in these subjects, we found previously-overlooked virus-derived SVs. We detected non-germline SVs derived from squirrel monkey retrovirus (SMRV), human immunodeficiency virus 1 (HIV-1), and human T lymphotropic virus (HTLV-1); these variants are attributable to infection of the sequenced lymphoblastoid cell lines (LCLs) or their progenitor cells and may impact gene expression results and the biosafety of experiments using these cells. In addition, we detected new heritable SVs derived from human herpesvirus 6 (HHV-6) and human endogenous retrovirus-K (HERV-K). We report the first solo-direct repeat (DR) HHV-6 likely to reflect DR rearrangement of a known full-length endogenous HHV-6. We used linkage disequilibrium between single nucleotide variants (SNVs) and variants in reads that align to HERV-K, which often cannot be mapped uniquely using conventional short-read sequencing analysis methods, to locate previously-unknown polymorphic HERV-K loci. Some of these loci are tightly linked to trait-associated SNVs, some are in complex genome regions inaccessible by prior methods, and some contain novel HERV-K haplotypes likely derived from gene conversion from an unknown source or introgression. These tools and results broaden our perspective on the coevolution between viruses and humans, including ongoing virus-to-human gene transfer contributing to genetic variation between humans.

Highlights

  • Union of genomes from discrete biological entities is a major engine of genetic diversity

  • Human genomes include sequences originating from viruses, but the extent to which these sequences vary in different humans is unknown

  • About 8% of human genetic material is derived from human endogenous retroviruses (HERV) that entered the human genetic lineage via retroviral infection of the germline, and subsequently developed transposon-like intracellular replication cycles; some HERVs integrated recently enough that they can be classified based on homology to extant exogenous retroviruses

Read more

Summary

Introduction

Union of genomes from discrete biological entities is a major engine of genetic diversity. Movement of genetic information between biological entities apart from sexual reproduction, known as horizontal gene transfer (HGT), has occurred in the human lineage. Some HGT happened so long ago that it is difficult to accurately classify the entity contributing the horizontally-transferred sequence according to extant taxonomies. This is the case for the bacteria, acquired millennia ago, represented as our mitochondrial genomes. About 8% of human genetic material is derived from human endogenous retroviruses (HERV) that entered the human genetic lineage via retroviral infection of the germline, and subsequently developed transposon-like intracellular replication cycles; some HERVs integrated recently enough that they can be classified based on homology to extant exogenous retroviruses

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call