Abstract

Long read sequencing technologies have the potential to accurately detect and phase variation in genomic regions that are difficult to fully characterize with conventional short read methods. These difficult to sequence regions include several clinically relevant genes with highly homologous pseudogenes, many of which are prone to gene conversions or other types of complex structural rearrangements. We present PB-Motif, a new method for identifying rearrangements between two highly homologous genomic regions using PacBio long reads. PB-Motif leverages clustering and filtering techniques to efficiently report rearrangements in the presence of sequencing errors and other systematic artifacts. Supporting reads for each high-confidence rearrangement can then be used for copy number estimation and phased variant calling. First, we demonstrate PB-Motif's accuracy with simulated sequence rearrangements of PMS2 and its pseudogene PMS2CL using simulated reads sweeping over a range of sequencing error rates. We then apply PB-Motif to 26 clinical samples, characterizing CYP21A2 and its pseudogene CYP21A1P as part of a diagnostic assay for congenital adrenal hyperplasia. We successfully identify damaging variation and patient carrier status concordant with clinical diagnosis obtained from multiplex ligation-dependent amplification (MLPA) and Sanger sequencing. The source code is available at: github.com/zstephens/pb-motif.

Highlights

  • Next-generation sequencing technologies have become ubiquitous in a wide range of diagnostic assays at many clinical laboratories

  • We present PB-Motif, a new methodology leveraging long reads for the de novo identification of arbitrary structural rearrangements confined to a pair of genomic regions

  • Variance in detection accuracy increases substantially above 7% error, which may be a result of non-uniform motif density across the simulated PMS2/PMS2CL homology

Read more

Summary

Introduction

Next-generation sequencing technologies have become ubiquitous in a wide range of diagnostic assays at many clinical laboratories. Proximal gene/pseudogene pairs are of particular interest because rearrangements between the two regions, typically from unequal crossing over or gene conversion events, can render the gene nonfunctional (Bischof et al, 2006). This mechanism has been shown to be a driver in many diseases, including Lynch syndrome (van der Klift et al, 2010), Hunter syndrome (Zhang et al, 2011), chronic granulomatous disease (Moens et al, 2014), among others (Bischof et al, 2006; Sen and Ghosh, 2013)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call