Predicting protein-protein interactions in unbalanced data using the primary structure of proteins.

Chi-Yuan Yu,Lih-Ching Chou,Darby Tien-Hao Chang

doi:10.1186/1471-2105-11-167

Chi-Yuan Yu, Lih-Ching Chou + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-11-167

Copy DOI

Abstract

BackgroundElucidating protein-protein interactions (PPIs) is essential to constructing protein interaction networks and facilitating our understanding of the general principles of biological systems. Previous studies have revealed that interacting protein pairs can be predicted by their primary structure. Most of these approaches have achieved satisfactory performance on datasets comprising equal number of interacting and non-interacting protein pairs. However, this ratio is highly unbalanced in nature, and these techniques have not been comprehensively evaluated with respect to the effect of the large number of non-interacting pairs in realistic datasets. Moreover, since highly unbalanced distributions usually lead to large datasets, more efficient predictors are desired when handling such challenging tasks.ResultsThis study presents a method for PPI prediction based only on sequence information, which contributes in three aspects. First, we propose a probability-based mechanism for transforming protein sequences into feature vectors. Second, the proposed predictor is designed with an efficient classification algorithm, where the efficiency is essential for handling highly unbalanced datasets. Third, the proposed PPI predictor is assessed with several unbalanced datasets with different positive-to-negative ratios (from 1:1 to 1:15). This analysis provides solid evidence that the degree of dataset imbalance is important to PPI predictors.ConclusionsDealing with data imbalance is a key issue in PPI prediction since there are far fewer interacting protein pairs than non-interacting ones. This article provides a comprehensive study on this issue and develops a practical tool that achieves both good prediction performance and efficiency using only protein sequence information.

Highlights

Elucidating protein-protein interactions (PPIs) is essential to constructing protein interaction networks and facilitating our understanding of the general principles of biological systems
The analyses included in this study reveal that a) the extent of imbalance of the sampled dataset and b) the efficiency of the employed classification algorithm are important to PPI predictors
The end of this section discusses some considerations for real world PPI data

Summary

Introduction

Elucidating protein-protein interactions (PPIs) is essential to constructing protein interaction networks and facilitating our understanding of the general principles of biological systems. Previous studies have revealed that interacting protein pairs can be predicted by their primary structure Most of these approaches have achieved satisfactory performance on datasets comprising equal number of interacting and non-interacting protein pairs. Various interactions among proteins are essential to diverse biological functions in a living cell Information about these interactions provides a basis to construct protein interaction networks and improves our understanding of the general principles of the workings of biological systems [1]. While experimentally detected interactions present only a small fraction of the real PPI network [14,15], many computational methods have been developed to provide complementary information for experimental approaches. Shoemaker and Panchenko have provided a comprehensive review of these computational methods [24]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC bioinformatics	Publication Date: Apr 2, 2010
Citations: 127	License type: cc-by

R Discovery Prime

R Discovery Prime

Predicting protein-protein interactions in unbalanced data using the primary structure of proteins.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

KUPS: constructing datasets of interacting and non-interacting protein pairs with associated attributions
X.-W Chen ... J C Jeong
Nucleic Acids Research | VOL. 39
X.-W Chen, et. al.X.-W Chen ... J C Jeong
15 Oct 2010
Nucleic Acids Research | VOL. 39

Computational prediction of virus-human protein-protein interactions using embedding kernelized heterogeneous data.
Esmaeil Nourani ... Farshad Khunjush
Molecular BioSystems | VOL. 12
Esmaeil Nourani, et. al.Esmaeil Nourani ... Farshad Khunjush
01 Jan 2015
Molecular BioSystems | VOL. 12

Predicting Protein-Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization
Hua Wang ... Feiping Nie
-
Hua Wang, et. al.Hua Wang ... Feiping Nie
01 Jan 2012
01 Jan 2012

A Simple Approach for Predicting Protein-Protein Interactions
Mamoon Rashid ... Gajendra P.S Raghava
Current Protein & Peptide Science | VOL. 11
Mamoon Rashid, et. al.Mamoon Rashid ... Gajendra P.S Raghava
01 Nov 2010
Current Protein & Peptide Science | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting protein-protein interactions in unbalanced data using the primary structure of proteins.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics