Abstract
A major challenge in current exome sequencing in autosomal recessive (AR) families is the lack of an effective method to prioritize single-nucleotide variants (SNVs). AR families are generally too small for linkage analysis, and length of homozygous regions is unreliable for identification of causative variants. Various common filtering steps usually result in a list of candidate variants that cannot be narrowed down further or ranked. To prioritize shortlisted SNVs we consider each homozygous candidate variant together with a set of SNVs flanking it. We compare the resulting array of genotypes between an affected family member and a number of control individuals and argue that, in a family, differences between family member and controls should be larger for a pathogenic variant and SNVs flanking it than for a random variant. We assess differences between arrays in two individuals by the Hamming distance and develop a suitable test statistic, which is expected to be large for a causative variant and flanking SNVs. We prioritize candidate variants based on this statistic and applied our approach to six patients with known pathogenic variants and found these to be in the top 2 to 10 percentiles of ranks.
Highlights
A major challenge in current exome sequencing in autosomal recessive (AR) families is the lack of an effective method to prioritize single-nucleotide variants (SNVs)
Homozygosity mapping is often applied to identify long runs of homozygosity[3], which may be interpreted as harboring segments of DNA identical by descent (IBD), but length alone is known to be a poor statistic for this purpose[4]
We developed a novel method to prioritize candidate variants in AR families based on direct comparison of segments of sequence variants between an affected family member and control individuals from the same population, that is, our approach works by comparing a single affected individual with a number of control individuals
Summary
A major challenge in current exome sequencing in autosomal recessive (AR) families is the lack of an effective method to prioritize single-nucleotide variants (SNVs). We assess differences between arrays in two individuals by the Hamming distance and develop a suitable test statistic, which is expected to be large for a causative variant and flanking SNVs. We prioritize candidate variants based on this statistic and applied our approach to six patients with known pathogenic variants and found these to be in the top 2 to 10 percentiles of ranks. Because of paucity of crossovers very close to the disease locus, SNVs in its vicinity tend to be IBD and, homozygous[3] For this reason, we want to see whether distances between affected and control individuals are larger for true candidate variants than other candidate variants.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.