Abstract
BackgroundDetection of genomic inversions remains challenging. Many existing methods primarily target inzversions with a non repetitive breakpoint, leaving inverted repeat (IR) mediated non-allelic homologous recombination (NAHR) inversions largely unexplored.ResultWe present npInv, a novel tool specifically for detecting and genotyping NAHR inversion using long read sub-alignment of long read sequencing data. We benchmark npInv with other tools in both simulation and real data. We use npInv to generate a whole-genome inversion map for NA12878 consisting of 30 NAHR inversions (of which 15 are novel), including all previously known NAHR mediated inversions in NA12878 with flanking IR less than 7kb. Our genotyping accuracy on this dataset was 94%. We used PCR to confirm the presence of two of these novel inversions. We show that there is a near linear relationship between the length of flanking IR and the minimum inversion size, without inverted repeats.ConclusionThe application of npInv shows high accuracy in both simulation and real data. The results give deeper insight into understanding inversion.
Highlights
Inversions can be broadly classified on the basis by which they are formed as nonhomologous end joining (NHEJ [2]), non allelic homologous recombination (NAHR) or fork stalling and template switching (FoSTeS [3]) inversions
Detecting and genotyping inversion We present Nanopore inversion (npInv), a novel tool designed for detecting and genotyping non-allelic homologous recombination (NAHR) mediated inversions from long read sequencing data
NpInv scans the alignment file for reads that contain pairs of subread alignments mapping to the same chromosome but with a different orientation (Fig. 2). npInv records this subread alignment pair as an inversion signal
Summary
Many existing methods primarily target inzversions with a non repetitive breakpoint, leaving inverted repeat (IR) mediated non-allelic homologous recombination (NAHR) inversions largely unexplored. Inversions can be broadly classified on the basis by which they are formed as nonhomologous end joining (NHEJ [2]), non allelic homologous recombination (NAHR) or fork stalling and template switching (FoSTeS [3]) inversions. The inversion sequence ligates directly to breakpoint without large homologous sequence [2]. Inversion polymorphisms remain one of the most poorly mapped classes of genetic variation. Inversions can be detected from aberrant linkage disequilibrium (LD) patterns from population single-nucleotide polymorphism (SNP) genotyping data, but this provides limited power to detect inversions smaller than 500 kb or with minor allele frequency less than 25% [7,8,9]. Inversions can be inferred from second generation sequence data by abnormal pair end mapping and split read align-
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.