Abstract

Non-allelic homologous recombination (NAHR) is a common mechanism for generating genome rearrangements and is implicated in numerous genetic disorders, but its detection in high-throughput sequencing data poses a serious challenge. We present a probabilistic model of NAHR and demonstrate its ability to find NAHR in low-coverage sequencing data from 44 individuals. We identify NAHR-mediated deletions or duplications in 109 of 324 potential NAHR loci in at least one of the individuals. These calls segregate by ancestry, are more common in closely spaced repeats, often result in duplicated genes or pseudogenes, and affect highly studied genes such as GBA and CYP2E1.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-015-0633-1) contains supplementary material, which is available to authorized users.

Highlights

  • Non-allelic homologous recombination (NAHR) is a biological mechanism for repairing broken chromosomes, which results in gross genome rearrangements

  • We developed a model for detecting NAHR from pairedend read data that addresses many of the issues that typically arise due to repeats

  • Because any individual read will overlap a small number of Variational position (VP), many mapping algorithms often concordantly map reads generated from novel hybrid repeats to a specific existing repeat in the reference genome, resulting in what we term phantom concordance

Read more

Summary

Introduction

Non-allelic homologous recombination (NAHR) is a biological mechanism for repairing broken chromosomes, which results in gross genome rearrangements. A significant portion (approximately 10% to 22%) of all genome rearrangements in humans, called structural variations, is thought to be the result of NAHR [1,2,3,4,5]. Understanding and detecting NAHR in individuals provide valuable insight for a wide variety of genomic disorders, disease susceptibilities, and cancers [6,7,8,9,10,11,12,13,14]. Despite its importance and prevalence, NAHR is challenging to detect with either computational or experimental techniques. Detection of NAHR requires a careful treatment of repetitive regions from the human genome. Repetitive regions are a major weakness of biological and computational techniques for

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call