Abstract
The prediction of protein complexes from protein-protein interactions (PPIs) is a well-studied problem in bioinformatics. However, the currently available PPI data is not enough to describe all known protein complexes. In this paper, we express the problem of determining the minimum number of (additional) required protein-protein interactions as a graph theoretic problem under the constraint that each complex constitutes a connected component in a PPI network. For this problem, we develop two computational methods: one is based on integer linear programming (ILPMinPPI) and the other one is based on an existing greedy-type approximation algorithm (GreedyMinPPI) originally developed in the context of communication and social networks. Since the former method is only applicable to datasets of small size, we apply the latter method to a combination of the CYC2008 protein complex dataset and each of eight PPI datasets (STRING, MINT, BioGRID, IntAct, DIP, BIND, WI-PHI, iRefIndex). The results show that the minimum number of additional required PPIs ranges from 51 (STRING) to 964 (BIND), and that even the four best PPI databases, STRING (51), BioGRID (67), WI-PHI (93) and iRefIndex (85), do not include enough PPIs to form all CYC2008 protein complexes. We also demonstrate that the proposed problem framework and our solutions can enhance the prediction accuracy of existing PPI prediction methods. ILPMinPPI can be freely downloaded from http://sunflower.kuicr.kyoto-u.ac.jp/~nakajima/.
Highlights
Identification of protein complexes is important for understanding cellular mechanisms because many proteins express their functions by forming complexes
In order to assess the usefulness of this idea, we examine a combination of GreedyMinPPI and each of four state-ofthe-art prediction methods for weighted protein-protein interactions (PPIs), Struct2Net [26], ENTS [27], PIP [28], and iWRAP [29], using four PPI datasets extracted from STRING [15], MINT [16], WI-PHI [20], Determining the minimum number of PPIs from protein complexes and IntAct [21]
We evaluated the performance of both ILPMinPPI and GreedyMinPPI using both synthetic data and real protein-protein interaction data
Summary
Is the current PPI data enough to explain all known protein complexes? If not, how many additional PPIs are required? The main purpose of this paper is to tackle this fundamental question
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.