Abstract

Among an assortment of genetic variations, Missense are major ones which a small subset of them may led to the upset of the protein function and ultimately end in human diseases. Various machine learning methods were declared to differentiate deleterious and benign missense variants by means of a large number of features, including structure, sequence, interaction networks, gene disease associations as well as phenotypes. However, development of a reliable and accurate algorithm for merging heterogeneous information is highly needed as it could be captured all information of complex interactions on network that genes participate in. In this study we proposed a new method based on the non-negative matrix tri-factorization clustering method. We outlined two versions of the proposed method: two-source and three-source algorithms. Two-source algorithm aggregates individual deleteriousness prediction methods and PPI network, and three-source algorithm incorporates gene disease associations into the other sources already mentioned. Four benchmark datasets were employed for internally and externally validation of both algorithms of our predictor. The results at all datasets confirmed that, our method outperforms most state of the art variant prediction tools. Two key features of our variant effect prediction method are worth mentioning. Firstly, despite the fact that the incorporation of gene disease information at three-source algorithm can improve prediction performance by comparison with two-source algorithm, our method did not hinder by type 2 circularity error unlike some recent ensemble-based prediction methods. Type 2 circularity error occurs when the predictor annotates variants on the basis of the genes located on. Secondly, the performance of our predictor is superior over other ensemble-based methods for variants positioned on genes in which we do not have enough information about their pathogenicity.

Highlights

  • Among an assortment of genetic variations, Missense are major ones which a small subset of them may led to the upset of the protein function and end in human diseases

  • Nonsynonymous single nucleotide variants that cause some changes in amino acid sequence of corresponding protein are regarded as missense ­variants[2]

  • Network features obtained from protein–protein interaction (PPI) are a kind of information which have received less attention, while they play a pivotal role in variant classification, because disturbed protein interactome regularly results in disease

Read more

Summary

Introduction

Among an assortment of genetic variations, Missense are major ones which a small subset of them may led to the upset of the protein function and end in human diseases. Various machine learning methods were declared to differentiate deleterious and benign missense variants by means of a large number of features, including structure, sequence, interaction networks, gene disease associations as well as phenotypes. The state-of-the-art nsSNV prediction methods, integrate association information between genes harbored the variants and diseases into variant-level information including sequence-based, structure-based and network features or available functional predictors of ­variants[18,19,20]. The hypothesis behind these approaches is that variants placed in genes which are related to each other, have alike properties. There is no systematic procedure to aggregate such knowledge to simultaneously take into account the formation of all input data sources and shape an accurate workflow of deleterious variant ­detection[22]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.