Abstract

Recent advances in high-throughput laboratory techniques captured large-scale protein–protein interaction (PPI) data, making it possible to create a detailed map of protein interaction networks, and thus enable us to detect protein complexes from these PPI networks. However, most of the current state-of-the-art studies still have some problems, for instance, incapability of identifying overlapping clusters, without considering the inherent organization within protein complexes, and overlooking the biological meaning of complexes. Therefore, we present a novel overlapping protein complexes prediction method based on core–attachment structure and function annotations (CFOCM), which performs in two stages: first, it detects protein complex cores with the maximum value of our defined cluster closeness function, in which the proteins are also closely related to at least one common function. Then it appends attach proteins into these detected cores to form the returned complexes. For performance evaluation, CFOCM and six classical methods have been used to identify protein complexes on three different yeast PPI networks, and three sets of real complexes including the Munich Information Center for Protein Sequences (MIPS), the Saccharomyces Genome Database (SGD) and the Catalogues of Yeast protein Complexes (CYC2008) are selected as benchmark sets, and the results show that CFOCM is indeed effective and robust for achieving the highest F-measure values in all tests.

Highlights

  • Most proteins in living organisms, performing their biological functions or involving with cellular processes, barely serve as single isolated entities, but rather via molecular interactions with other partners to form complexes [1]

  • A protein complex is pervasively modeled as an induced subgraph of protein–protein interaction (PPI) network G, the proteins in which have dense intra-connections and are sparely connected to the rest of the network, we introduce a new and effective closeness function to quantify the probability that G is complex based on network topology: cf(G

  • We have proposed a novel algorithm CFOCM for protein complex identification from the protein–protein interaction network

Read more

Summary

Introduction

Most proteins in living organisms, performing their biological functions or involving with cellular processes, barely serve as single isolated entities, but rather via molecular interactions with other partners to form complexes [1]. Protein complexes are the key molecular entities to perform cellular functions, such as signal transduction, post-translational modification, DNA transcription, and mRNA translation. The damage of protein complexes is one of the main factors inducing severe diseases [2]. Significant progress in high-throughput laboratory techniques involving Tandem Affinity Purification (TAP) [3] and Mass Spectrometry (MS) [4] has been made to discover protein complexes on a large scale. Laboratory experiments are expensive and time-consuming, resulting in poor coverage of the complete protein complexes. Given a PPI network, as the protein complexes are formed by physical aggregations of several binding proteins, they are assumed to be the functionally and structurally cohesive substructures, and graph clustering methods have been put forward to search densely connected regions in PPI networks as protein complexes

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call