Abstract

A challenge in medical genomics is to identify variants and genes associated with severe genetic disorders. Based on the premise that severe, early-onset disorders often result in a reduction of evolutionary fitness, several statistical methods have been developed to predict pathogenic variants or constrained genes based on the signatures of negative selection in human populations. However, we currently lack a statistical framework to jointly predict deleterious variants and constrained genes from both variant-level features and gene-level selective constraints. Here we present such a unified approach, UNEECON, based on deep learning and population genetics. UNEECON treats the contributions of variant-level features and gene-level constraints as a variant-level fixed effect and a gene-level random effect, respectively. The sum of the fixed and random effects is then combined with an evolutionary model to infer the strength of negative selection at both variant and gene levels. Compared with previously published methods, UNEECON shows improved performance in predicting missense variants and protein-coding genes associated with autosomal dominant disorders, and feature importance analysis suggests that both gene-level selective constraints and variant-level predictors are important for accurate variant prioritization. Furthermore, based on UNEECON, we observe a low correlation between gene-level intolerance to missense mutations and that to loss-of-function mutations, which can be partially explained by the prevalence of disordered protein regions that are highly tolerant to missense mutations. Finally, we show that genes intolerant to both missense and loss-of-function mutations play key roles in the central nervous system and the autism spectrum disorders. Overall, UNEECON is a promising framework for both variant and gene prioritization.

Highlights

  • A fundamental question in biology is to understand how genomic variation contributes to phenotypic variation and disease risk

  • Since early-onset, severe genetic disorders are often associated with a reduction of evolutionary fitness, signatures of negative selection, such as sequence conservation, have been widely used to predict deleterious variants associated with Mendelian disorders [4,5,6,7,8,9,10,11,12]

  • Inspired by classical sequence conservation models [38,39,40,41,42], which use site-specific substitution rate as a proxy of negative selection, we utilize the relative probability of the occurrence of a potential missense mutation in human populations, compared to neutral mutations, as an allele-specific predictor of negative selection

Read more

Summary

Introduction

A fundamental question in biology is to understand how genomic variation contributes to phenotypic variation and disease risk. By learning a linear or nonlinear mathematical function from predictive variant features, such as sequence conservation scores and protein structural features, to the strength of negative selection, these statistical methods estimate negative selection on observed and potential mutations in the human genome. The estimated strength of negative selection can be utilized to prioritize deleterious variants associated with severe genetic disorders Because these evolutionary approaches are trained on tremendous natural polymorphisms observed in healthy individuals instead of sparsely annotated pathogenic variants, they have shown good performance in predicting pathogenic variants, frequently outperforming or on par with supervised machine learning models trained on disease data [8, 10,11,12]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.