Abstract

BackgroundEssential proteins are distinctly important for an organism’s survival and development and crucial to disease analysis and drug design as well. Large-scale protein-protein interaction (PPI) data sets exist in Saccharomyces cerevisiae, which provides us with a valuable opportunity to predict identify essential proteins from PPI networks. Many network topology-based computational methods have been designed to detect essential proteins. However, these methods are limited by the completeness of available PPI data. To break out of these restraints, some computational methods have been proposed by integrating PPI networks and multi-source biological data. Despite the progress in the research of multiple data fusion, it is still challenging to improve the prediction accuracy of the computational methods.ResultsIn this paper, we design a novel iterative model for essential proteins prediction, named Randomly Walking in the Heterogeneous Network (RWHN). In RWHN, a weighted protein-protein interaction network and a domain-domain association network are constructed according to the original PPI network and the known protein-domain association network, firstly. And then, we establish a new heterogeneous matrix by combining the two constructed networks with the protein-domain association network. Based on the heterogeneous matrix, a transition probability matrix is established by normalized operation. Finally, an improved PageRank algorithm is adopted on the heterogeneous network for essential proteins prediction. In order to eliminate the influence of the false negative, information on orthologous proteins and the subcellular localization information of proteins are integrated to initialize the score vector of proteins. In RWHN, the topology, conservative and functional features of essential proteins are all taken into account in the prediction process. The experimental results show that RWHN obviously exceeds in predicting essential proteins ten other competing methods.ConclusionsWe demonstrated that integrating multi-source data into a heterogeneous network can preserve the complex relationship among multiple biological data and improve the prediction accuracy of essential proteins. RWHN, our proposed method, is effective for the prediction of essential proteins.

Highlights

  • Essential proteins are distinctly important for an organism’s survival and development and crucial to disease analysis and drug design as well

  • Initializing the score vector of proteins and domains In this paper, the functional feature derived from subcellular localization information and conservative feature obtained by homologous information are both taken into account when scoring proteins

  • Random walk for the heterogeneous network According to the three constructed network PN, Protein-domain association (PDN) and DN, our prediction model Randomly Walking in the Heterogeneous Network (RWHN) based on random walk consists of the following three steps: Step 1: Establishing the heterogeneous matrix HM

Read more

Summary

Introduction

Essential proteins are distinctly important for an organism’s survival and development and crucial to disease analysis and drug design as well. Many network topology-based computational methods have been designed to detect essential proteins. These methods are limited by the completeness of available PPI data. To break out of these restraints, some computational methods have been proposed by integrating PPI networks and multi-source biological data. Jeong H et al [1] proposed the centrality-lethality rule and pointed out that the essentiality of proteins is closely related to the network topology. The strategy based on node deletion [11] is an effective way to measure the importance of nodes Most of these methods rarely analyse the intrinsic properties of other known essential proteins, but solely use the topological features of the network. It is urgent to improve fault-tolerance ability of the identification algorithm to the false positive data in PPI networks

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call