Method for Essential Protein Prediction Based on a Novel Weighted Protein-Domain Interaction Network.

Zixuan Meng,Xueyong Li,Linai Kuang,Zhiping Chen,Yihong Tan,Lei Wang,Zhen Zhang

doi:10.3389/fgene.2021.645932

Abstract

In recent years a number of calculative models based on protein-protein interaction (PPI) networks have been proposed successively. However, due to false positives, false negatives, and the incompleteness of PPI networks, there are still many challenges affecting the design of computational models with satisfactory predictive accuracy when inferring key proteins. This study proposes a prediction model called WPDINM for detecting key proteins based on a novel weighted protein-domain interaction (PDI) network. In WPDINM, a weighted PPI network is constructed first by combining the gene expression data of proteins with topological information extracted from the original PPI network. Simultaneously, a weighted domain-domain interaction (DDI) network is constructed based on the original PDI network. Next, through integrating the newly obtained weighted PPI network and weighted DDI network with the original PDI network, a weighted PDI network is further constructed. Then, based on topological features and biological information, including the subcellular localization and orthologous information of proteins, a novel PageRank-based iterative algorithm is designed and implemented on the newly constructed weighted PDI network to estimate the criticality of proteins. Finally, to assess the prediction performance of WPDINM, we compared it with 12 kinds of competitive measures. Experimental results show that WPDINM can achieve a predictive accuracy rate of 90.19, 81.96, 70.72, 62.04, 55.83, and 51.13% in the top 1%, top 5%, top 10%, top 15%, top 20%, and top 25% separately, which exceeds the prediction accuracy achieved by traditional state-of-the-art competing measures. Owing to the satisfactory identification effect, the WPDINM measure may contribute to the further development of key protein identification.

Highlights

Accumulating evidence indicates that proteins have a tremendous impact on almost all life activities
The data presented by the bar chart illustrates that the identification performance of WPDINM exceeds the other measures by comparing the forecast accuracy from top 1% to top 25% proteins
It’s apparent from Figure 2 that, with the comparison of prediction accuracy in the top 1% proteins, 90.19% of the true key proteins are detected by the WPDINM method

Summary

Introduction

Accumulating evidence indicates that proteins have a tremendous impact on almost all life activities. Based on the rule of centrality-lethality (Jeong et al, 2001), researchers have proposed a series of prediction models, which have been designed successively to infer potential critical proteins These include Information Centrality (IC) (Stephenson and Zelen, 1989), Degree Centrality (DC) (Hahn and Kern, 2004), Subgraph Centrality (SC) (Ernesto and Rodriguez-Velazquez, 2005), Closeness Centrality (CC) (Wuchty and Stadler, 2003), Betweenness Centrality (BC) (Jop et al, 2005), Neighbor Centrality (NC) (Wang et al, 2012), and local average connectivity (LAC) (Li et al, 2015). These prediction models cannot achieve high identification accuracy owing to the incompleteness of current PPI networks (Chen and Yuan, 2006)

Methods

Results

Conclusion