Abstract

In recent years, with the development of high-throughput technologies, lots of computational methods for predicting essential proteins based on protein-protein interaction (PPI) networks and biological information of proteins have been proposed successively. However, due to the incompleteness of PPI networks, the prediction accuracy achieved by these methods is still unsatisfactory, and it remains to be a challenging work to design effective computational models to identify essential proteins. In this manuscript, a novel Prediction Model based on the Non-negative Matrix Factorization (PMNMF for abbreviation) is proposed. In PMNMF, an original PPI network will be constructed first based on PPIs downloaded from any given benchmark database. And then, based on topological features of protein nodes, the original PPI network will be further converted to a weighted PPI network. Moreover, in order to overcome the incompleteness of PPI networks, the NMF (Non-negative Matrix Factorization) method will be implemented on the weighted PPI network to obtain a transition probability matrix. And then, by integrating biological information including the gene expression information, homologous information and subcellular localization information of proteins, a unique initial score will be calculated and assigned to each protein node in the weighed PPI network, based on which, an improved Page-Rank algorithm will be designed to infer potential essential proteins. Finally, in order to evaluate the performance of PMNMF, it will be compared with 14 state-of-the-art prediction models, and experimental results show that PMNMF can achieve the best identification accuracy.

Highlights

  • Essential proteins are found in large numbers in protein complexes, and their absence will lead to the loss of functions of related protein complexes, and make it impossible for organisms to survive or develop

  • EXPERIMENTAL DATA In order to evaluate the predictive performance of PMNMF, we will compare it with 14 representative basic protein prediction methods including IC [1], CC [3], DC [4], BC [5], SC [6], NC [7], PeC [12], ION [14], CoEWC [16] and POEM [17], CVIM[25], NPRI[30], TEGS[20] and RWHN[27] simultaneously

  • Based on the dataset downloaded from the COMPART-MENTS database [46], we can obtain a dataset consisting of the subcellular location information of proteins, in which, we will only keep 11 categories of subcellular localization data closely related to essential proteins such as the Endoplasmic, Cytoskeleton, Golgi, Cytosol, Vacuole, Mitochondrion, Endosome, Plasma, Nucleus, Peroxisome and Extracellular etc

Read more

Summary

Introduction

Essential proteins are found in large numbers in protein complexes, and their absence will lead to the loss of functions of related protein complexes, and make it impossible for organisms to survive or develop. With the rapid development of high-throughput techniques, more and more protein-protein interactions (PPIs) have been detected successively, based on which, PPI networks are established and applied widely in designing computational models for inferring essential proteins. Based on the topological characteristic of centrality [1,2] of PPI networks, a series of calculation models including. Centrality][6], NC(Neighbor Centrality)[7] have been proposed to discover basic proteins. Designed an identification model named LAC to identify key proteins based on the Local Average Connectivity of protein nodes in PPI networks [9]. Qi Yi et al [10] designed a prediction model to infer basic proteins based on the

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call