A novel method to predict essential proteins based on tensor and HITS algorithm

Zhihong Zhang,Xueyong Li,Lei Wang,Sai Hu,Yingchun Luo,Bihai Zhao

doi:10.1186/s40246-020-00263-7

Zhihong Zhang, Xueyong Li + Show 4 more

Open Access

https://doi.org/10.1186/s40246-020-00263-7

Copy DOI

Journal: Human genomics	Publication Date: Apr 6, 2020
Citations: 12	License type: open-access

Affiliation: Changsha University, Hunan Children's Hospital

Abstract

BackgroundEssential proteins are an important part of the cell and closely related to the life activities of the cell. Hitherto, Protein-Protein Interaction (PPI) networks have been adopted by many computational methods to predict essential proteins. Most of the current approaches focus mainly on the topological structure of PPI networks. However, those methods relying solely on the PPI network have low detection accuracy for essential proteins. Therefore, it is necessary to integrate the PPI network with other biological information to identify essential proteins.ResultsIn this paper, we proposed a novel random walk method for identifying essential proteins, called HEPT. A three-dimensional tensor is constructed first by combining the PPI network of Saccharomyces cerevisiae with multiple biological data such as gene ontology annotations and protein domains. Then, based on the newly constructed tensor, we extended the Hyperlink-Induced Topic Search (HITS) algorithm from a two-dimensional to a three-dimensional tensor model that can be utilized to infer essential proteins. Different from existing state-of-the-art methods, the importance of proteins and the types of interactions will both contribute to the essential protein prediction. To evaluate the performance of our newly proposed HEPT method, proteins are ranked in the descending order based on their ranking scores computed by our method and other competitive methods. After that, a certain number of the ranked proteins are selected as candidates for essential proteins. According to the list of known essential proteins, the number of true essential proteins is used to judge the performance of each method. Experimental results show that our method can achieve better prediction performance in comparison with other nine state-of-the-art methods in identifying essential proteins.ConclusionsThrough analysis and experimental results, it is obvious that HEPT can be used to effectively improve the prediction accuracy of essential proteins by the use of HITS algorithm and the combination of network topology with gene ontology annotations and protein domains, which provides a new insight into multi-data source fusion.

Highlights

Essential proteins are an important part of the cell and closely related to the life activities of the cell
The above methods have improved the prediction accuracy by integrating Protein-Protein Interaction (PPI) networks and multi-source biological data
I Experimental data Computational analysis was performed by a PPI network of Saccharomyces cerevisiae

Summary

Introduction

Essential proteins are an important part of the cell and closely related to the life activities of the cell. Based on the topological properties of PPI networks, a lot of computational methods such as degree of centrality (DC) [6], information center (IC) [7], closeness centrality (CC) [8], betweenness centrality (BC) [9], subgraph centrality (SC) [10], and neighbor centrality (NC) [11] have been proposed for prediction of essential proteins. Ren et al [13] proposed a prediction model for essential proteins by fusing PPI network topology and protein complex information. Peng et al [16] proposed a predictive model, called UDoNC, by integrating protein domain information and PPI networks in yeast It showed that proteins with more types of self-protein domains tend to be essential. Zhao et al [19] proposed a predictive model POEM that can measure the essentiality of protein, by detecting overlapping basic modules based on required protein modularity

Methods

Results

Conclusion