Abstract

Detecting protein complexes is one of the keys to understanding cellular organization and processes principles. With high-throughput experiments and computing science development, it has become possible to detect protein complexes by computational methods. However, most computational methods are based on either unsupervised learning or supervised learning. Unsupervised learning-based methods do not need training datasets, but they can only detect one or several topological protein complexes. Supervised learning-based methods can detect protein complexes with different topological structures. However, they are usually based on a type of training model, and the generalization of a single model is poor. Therefore, we propose an Ensemble Learning Framework for Detecting Protein Complexes (ELF-DPC) within protein-protein interaction (PPI) networks to address these challenges. The ELF-DPC first constructs the weighted PPI network by combining topological and biological information. Second, it mines protein complex cores using the protein complex core mining strategy we designed. Third, it obtains an ensemble learning model by integrating structural modularity and a trained voting regressor model. Finally, it extends the protein complex cores and forms protein complexes by a graph heuristic search strategy. The experimental results demonstrate that ELF-DPC performs better than the twelve state-of-the-art approaches. Moreover, functional enrichment analysis illustrated that ELF-DPC could detect biologically meaningful protein complexes. The code/dataset is available for free download from https://github.com/RongquanWang/ELF-DPC.

Highlights

  • Most complex systems, such as biological systems and human society, can be presented as complex networks in the real world

  • A protein-protein interaction (PPI) network is generally described as a weighted graph G = (V, E, W), where V is a set of proteins, E is a set of interactions, and W is a n × n(n = |V|) matrix that represents the reliability of protein pairs in PPI networks

  • 2.3.4.2 The Structural Modularity of Protein Complexes Based on the within-module and between module edges of subgraphs and the size of the subgraph, we present a new formal definition of protein complexes in PPI networks (Wu et al, 2009; Yu et al, 2011; Nepusz et al, 2012; Wang et al, 2019)

Read more

Summary

INTRODUCTION

Most complex systems, such as biological systems and human society, can be presented as complex networks in the real world. Community detection in complex networks is essential in many fields, aiming to identify clusters with high internal connectivity. These clusters are well separated from the rest of the network. A protein complex is a group of proteins formed by interacting simultaneously and in place. With the development of high-throughput experimental methods, many PPI networks have been produced, which usually have small world, scale-free, and modularity characteristics. They could be formulated as graphs where the nodes represent the proteins, and the edges represent the interactions. More details on the related work are introduced in the related work section

Related Work
Observations and Contributions
Datasets
Terminologies
Methods
Neighborhood Affinity
F-Measure
Evaluation Metrics
Jaccard
20: Jaccard
Parameter Selection
Comparison With State-of-the-art Algorithms
Comparison With Functional Enrichment Analysis
Case Study
CONCLUSION
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call