Abstract

Alu insertions have contributed to >11% of the human genome and ∼30–35 Alu subfamilies remain actively mobile, yet the characterization of polymorphic Alu insertions from short-read data remains a challenge. We build on existing computational methods to combine Alu detection and de novo assembly of WGS data as a means to reconstruct the full sequence of insertion events from Illumina paired end reads. Comparison with published calls obtained using PacBio long-reads indicates a false discovery rate below 5%, at the cost of reduced sensitivity due to the colocation of reference and non-reference repeats. We generate a highly accurate call set of 1614 completely assembled Alu variants from 53 samples from the Human Genome Diversity Project (HGDP) panel. We utilize the reconstructed alternative insertion haplotypes to genotype 1010 fully assembled insertions, obtaining >99% agreement with genotypes obtained by PCR. In our assembled sequences, we find evidence of premature insertion mechanisms and observe 5′ truncation in 16% of AluYa5 and AluYb8 insertions. The sites of truncation coincide with stem-loop structures and SRP9/14 binding sites in the Alu RNA, implicating L1 ORF2p pausing in the generation of 5′ truncations. Additionally, we identified variable AluJ and AluS elements that likely arose due to non-retrotransposition mechanisms.

Highlights

  • Mobile elements (MEs) are discrete fragments of nuclear DNA that are capable of copied movement to other chromosomal locations within the genome [1]

  • Over 60 novel Alu insertions have been shown to cause mutations leading to disease [17,18,19]; older, existing Alu insertions in the genome have been shown to facilitate the formation of subsequent rearrangements by providing a template of sequence utilized in non-allelic homologous recombination, replication-template switching and the repair of double-strand breaks [2,18,19,20,21,22,23,24,25,26]

  • We utilized whole genome sequence (WGS) data from a subset of the Human Genome Diversity Project (HGDP) collection, consisting of 2 × 101 bp paired-end libraries from 53 individuals across seven populations exhibiting a cline of diversity reflecting the major migration of humans out of Africa, with a median coverage of ∼7x per genome

Read more

Summary

Introduction

Mobile elements (MEs) are discrete fragments of nuclear DNA that are capable of copied movement to other chromosomal locations within the genome [1]. The vast majority of Alu insertions represent events that occurred in the germline or early during embryogenesis [4] millions of years ago and exist as non-functional elements that are highly mutated and no longer capable of mobilization [3]. Subsets of MEs, including Alu and its autonomous partner L1Hs, remain active and continue to contribute to new ME insertions (MEIs), resulting in genomic variation between individuals [5] and between somatic tissues within an individual [6,7]. The human genome contains elements derived from the AluY, AluS and AluJ lineages, which can be further stratified into more than ∼35 subfamilies based on sequence diversity and diagnostic mutations [2,5,8]. Alu insertions continue to shape the genomic landscape and are recognized as profound mediators of genomic structural variation

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.