Abstract

Protein Function Module (PFM) identification in Protein-Protein Interaction Networks (PPINs) is one of the most important and challenging tasks in computational biology. The quick and accurate detection of PFMs in PPINs can contribute greatly to the understanding of the functions, properties, and biological mechanisms in research on various diseases and the development of new medicines. Despite the performance of existing detection approaches being improved to some extent, there are still opportunities for further enhancements in the efficiency, accuracy, and robustness of such detection methods. Based on the uniqueness of the network-clustering problem in the context of PPINs, this study proposed a very effective and efficient model based on the Lin-Kernighan-Helsgaun algorithm for detecting PFMs in PPINs. To demonstrate the effectiveness and efficiency of the proposed model, computational experiments are performed using three different categories of species datasets. The computational results reveal that the proposed model outperforms existing detection techniques in terms of two key performance indices, i.e., the degree of polymerization inside PFMs (cohesion) and the deviation degree between PFMs (separation), while being very fast and robust. The proposed model can be used to help researchers decide whether to conduct further expensive and time-consuming biological experiments and to select target proteins from large-scale PPI data for further detailed research.

Highlights

  • Research on detecting Protein Function Modules (PFMs) has become one of the most important research topics in both life sciences and computing sciences since the completion of the human genome project

  • To improve the time complexity, accuracy, and robustness of the PFM detection algorithm, this study proposes a new model, called the Lin-Kernighan-Helsgaun Model (LKHM), which combines the LKH algorithm with biological gene ontology knowledge for detecting PFMs in Protein-Protein Interaction Networks (PPINs)

  • Based on the topological structure of PPINs, clustering proteins in PPINs can be transformed to search the optimal tour in a connection graph, where nodes correspond to individual proteins, edges connecting two nodes correspond to interactions between proteins, and the distance between two nodes corresponds to the difference between two proteins

Read more

Summary

Introduction

Research on detecting Protein Function Modules (PFMs) has become one of the most important research topics in both life sciences and computing sciences since the completion of the human genome project. Researchers focused mainly on biological experimental technologies, such as the yeast two-hybrid system [3], and affinity purification followed by mass spectrometry [4] This type of detection method usually predicts the functions of the proteins by analyzing their physical interactions, properties, and chemical characteristics. Using methodologies such as machine learning, network analysis, graph theory, and complex network theory to identify clusters of interacting proteins can help researchers gain a deeper understanding of PFMs and their evolutionary relationships Such computational approaches can make up for the shortcomings of biological experimental technologies, but can help in understanding complex higher-level cell tissues, predicting the function of unknown proteins, studying the pathogenesis of diseases, and finding new drug targets.

Literature review
Data pre-processing
PPIN connection graph modelling
The shortest path sequencing
Clustering results post-processing
Function information-based PFM optimization
Topology-based PFM optimization
Performance evaluation indices
Analytical results and discussion
Conclusions and recommendations for future studies
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call