Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks.

Meghana Venkata Palukuri,Edward M Marcotte

doi:10.1371/journal.pone.0262056

Abstract

Characterization of protein complexes, i.e. sets of proteins assembling into a single larger physical entity, is important, as such assemblies play many essential roles in cells such as gene regulation. From networks of protein-protein interactions, potential protein complexes can be identified computationally through the application of community detection methods, which flag groups of entities interacting with each other in certain patterns. Most community detection algorithms tend to be unsupervised and assume that communities are dense network subgraphs, which is not always true, as protein complexes can exhibit diverse network topologies. The few existing supervised machine learning methods are serial and can potentially be improved in terms of accuracy and scalability by using better-suited machine learning models and parallel algorithms. Here, we present Super.Complex, a distributed, supervised AutoML-based pipeline for overlapping community detection in weighted networks. We also propose three new evaluation measures for the outstanding issue of comparing sets of learned and known communities satisfactorily. Super.Complex learns a community fitness function from known communities using an AutoML method and applies this fitness function to detect new communities. A heuristic local search algorithm finds maximally scoring communities, and a parallel implementation can be run on a computer cluster for scaling to large networks. On a yeast protein-interaction network, Super.Complex outperforms 6 other supervised and 4 unsupervised methods. Application of Super.Complex to a human protein-interaction network with ~8k nodes and ~60k edges yields 1,028 protein complexes, with 234 complexes linked to SARS-CoV-2, the COVID-19 virus, with 111 uncharacterized proteins present in 103 learned complexes. Super.Complex is generalizable with the ability to improve results by incorporating domain-specific features. Learned community characteristics can also be transferred from existing applications to detect communities in a new application with no known communities. Code and interactive visualizations of learned human protein complexes are freely available at: https://sites.google.com/view/supercomplex/super-complex-v3-0.

Highlights

A protein complex is a group of proteins that interact with each other to perform a particular function in a cell, the basic biological unit of all living organisms
We evaluate the performance of the ML binary classifier using accuracies, precision-recall-f1 score measures, average precision score, and PR curves for the test sets while evaluating these measures for the training set to compare with the test measures and check the bias and variance of the algorithm to make sure it is not underfitting or overfitting the data
Epsilongreedy heuristics in conjunction with other heuristics such as iterative simulated annealing have not been applied in the past for community detection

Summary

Introduction

A protein complex is a group of proteins that interact with each other to perform a particular function in a cell, the basic biological unit of all living organisms. A common strategy is to select a seed (such as a node or a clique) and grow it into a candidate community by iteratively selecting neighbors to add to the current subgraph using heuristics such as iterative simulated annealing until a defined stopping criterion is met for the growth process. SLPC’s [15] regression model was implemented on a human PPI network reweighted by breast-cancer specific PPIs extracted from biomedical literature to detect diseasespecific complexes [17] These methods employ serial candidate community sampling, negatively impacting their scalability to large networks such as hu.MAP [4], a human protein-interaction network with ~8k nodes and ~60k edges. Supervised ML pipeline for molecular complex detection in PPI networks greedy heuristic, followed by an additional heuristic such as iterative simulated annealing or pseudo-metropolis using the learned community fitness function. We apply Super.Complex to hu.MAP, a human protein-protein interaction network with ~8k nodes and ~60k edges to yield 1028 protein complexes, including high-scoring previously unknown protein complexes, potentially contributing to new biology, and make all data, code, and interactive visualizations openly and freely available at https://sites. google.com/view/supercomplex/super-complex-v3-0

Materials and methods

Results and discussion

Method

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Dec 31, 2021
Citations: 14	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

A methodology for detecting the orthology signal in a PPI network at a functional complex level
Pavol Jancura ... Elena Marchiori
BMC Bioinformatics | VOL. 13
Pavol Jancura, et. al.Pavol Jancura ... Elena Marchiori
01 Jun 2012
BMC Bioinformatics | VOL. 13

Identifying protein complexes based on node embeddings obtained from protein-protein interaction networks
Xiaoxia Liu ... Lei Wang
BMC Bioinformatics | VOL. 19
Xiaoxia Liu, et. al.Xiaoxia Liu ... Lei Wang
21 Sep 2018
BMC Bioinformatics | VOL. 19

Molecular complex detection in protein interaction networks through reinforcement learning
Meghana V Palukuri ... Edward M Marcotte
BMC Bioinformatics | VOL. 24
Meghana V Palukuri, et. al.Meghana V Palukuri ... Edward M Marcotte
02 Aug 2023
BMC Bioinformatics | VOL. 24

Network simulation reveals significant contribution of network motifs to the age-dependency of yeast protein-protein interaction networks.
Cheng Liang ... Dan Song
Molecular bioSystems | VOL. 10
Cheng Liang, et. al.Cheng Liang ... Dan Song
25 Jun 2014
Molecular bioSystems | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE