Abstract

Exome sequencing is becoming a standard tool for mapping Mendelian disease-causing (or pathogenic) non-synonymous single nucleotide variants (nsSNVs). Minor allele frequency (MAF) filtering approach and functional prediction methods are commonly used to identify candidate pathogenic mutations in these studies. Combining multiple functional prediction methods may increase accuracy in prediction. Here, we propose to use a logit model to combine multiple prediction methods and compute an unbiased probability of a rare variant being pathogenic. Also, for the first time we assess the predictive power of seven prediction methods (including SIFT, PolyPhen2, CONDEL, and logit) in predicting pathogenic nsSNVs from other rare variants, which reflects the situation after MAF filtering is done in exome-sequencing studies. We found that a logit model combining all or some original prediction methods outperforms other methods examined, but is unable to discriminate between autosomal dominant and autosomal recessive disease mutations. Finally, based on the predictions of the logit model, we estimate that an individual has around 5% of rare nsSNVs that are pathogenic and carries ∼22 pathogenic derived alleles at least, which if made homozygous by consanguineous marriages may lead to recessive diseases.

Highlights

  • Since the first successful application of exome sequencing in finding the causal mutation for a Mendelian disease [1], many such studies have been conducted to identify other Mendelian disease-causing variants

  • We propose the use of a statistical model to combine prediction scores from multiple methods and to estimate the chance of a rare mutation being Mendelian disease-causing

  • We found that the predictive power of CONDEL combining five individual methods is slightly inferior to that of one individual method (i.e., MutationTaster) in terms of Receiver Operating Characteristic (ROC) AUC, it is better than those of all five individual methods in terms of PR AUC

Read more

Summary

Introduction

Since the first successful application of exome sequencing in finding the causal mutation for a Mendelian disease [1], many such studies have been conducted to identify other Mendelian disease-causing (or pathogenic) variants. Compared to genetic linkage studies of Mendelian diseases, exome sequencing requires a smaller number of affected individuals, who may even be unrelated. With the large number (typically around 8,000–10,000) of nsSNVs in an individual genome and the small number of (usually affected and unrelated) individuals available, standard methods for genetic linkage and association do not work for exome sequencing studies of Mendelian diseases. Variants are rejected if they are not found in multiple cases or if they conflicts with the known disease inheritance mode. This approach has successfully reduced the number of mutations to look at in numerous studies, and several tools [3,4,5] are developed to automate this process

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.