Abstract

BackgroundRevealing protein complexes are important for understanding principles of cellular organization and function. High-throughput experimental techniques have produced a large amount of protein interactions, which makes it possible to predict protein complexes from protein-protein interaction (PPI) networks. However, the small amount of known physical interactions may limit protein complex detection.MethodsThe new PPI networks are constructed by integrating PPI datasets with the large and readily available PPI data from biomedical literature, and then the less reliable PPI between two proteins are filtered out based on semantic similarity and topological similarity of the two proteins. Finally, the supervised learning protein complex detection (SLPC), which can make full use of the information of available known complexes, is applied to detect protein complex on the new PPI networks.ResultsThe experimental results of SLPC on two different categories yeast PPI networks demonstrate effectiveness of the approach: compared with the original PPI networks, the best average improvements of 4.76, 6.81 and 15.75 percentage units in the F-score, accuracy and maximum matching ratio (MMR) are achieved respectively; compared with the denoising PPI networks, the best average improvements of 3.91, 4.61 and 12.10 percentage units in the F-score, accuracy and MMR are achieved respectively; compared with ClusterONE, the start-of the-art complex detection method, on the denoising extended PPI networks, the average improvements of 26.02 and 22.40 percentage units in the F-score and MMR are achieved respectively.ConclusionsThe experimental results show that the performances of SLPC have a large improvement through integration of new receivable PPI data from biomedical literature into original PPI networks and denoising PPI networks. In addition, our protein complexes detection method can achieve better performance than ClusterONE.

Highlights

  • Revealing protein complexes are important for understanding principles of cellular organization and function

  • PPIExtractor contains four modules: (i) Named Entity Recognition (NER) module which aims to identify the protein names in the biomedical literature; (ii) Normalization module which determines the unique identifier of proteins identified in NER module; (iii) protein-protein interaction (PPI) extraction module which extracts the PPI information in the biomedical literature; (iv) PPI visualization module which displays the extracted PPI information in the form of a graph

  • 127,217 PubMed abstracts were downloaded from PubMed website with the query string “((Saccharomyces cerevisiae) OR yeast) AND protein” and PPIExtractor extracted a total of 126,165 protein interactions from these abstracts

Read more

Summary

Introduction

Revealing protein complexes are important for understanding principles of cellular organization and function. Complex formation is one of the typical patterns in this graph and many cellular functions are performed by these complexes containing multiple protein interaction partners. Many automatic approaches have been proposed to detect the protein complexes from PPI networks, such as CMC [1], COACH [2], MCODE [3], MCL [4], Cfinder [5], and ClusterONE [6]. Most of these methods are based on unsupervised graph clustering methods and predict protein complexes only with pre-defined rules. Compared with them, supervised learning methods [7,8] can utilize the known complexes information and may achieve better performances

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call