Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures.

Guang-Hui Liu,Dong-Jun Yu,Hong-Bin Shen

doi:10.1007/s00232-015-9856-z

Abstract

Accurately predicting protein-protein interaction sites (PPIs) is currently a hot topic because it has been demonstrated to be very useful for understanding disease mechanisms and designing drugs. Machine-learning-based computational approaches have been broadly utilized and demonstrated to be useful for PPI prediction. However, directly applying traditional machine learning algorithms, which often assume that samples in different classes are balanced, often leads to poor performance because of the severe class imbalance that exists in the PPI prediction problem. In this study, we propose a novel method for improving PPI prediction performance by relieving the severity of class imbalance using a data-cleaning procedure and reducing predicted false positives with a post-filtering procedure: First, a machine-learning-based data-cleaning procedure is applied to remove those marginal targets, which may potentially have a negative effect on training a model with a clear classification boundary, from the majority samples to relieve the severity of class imbalance in the original training dataset; then, a prediction model is trained on the cleaned dataset; finally, an effective post-filtering procedure is further used to reduce potential false positive predictions. Stringent cross-validation and independent validation tests on benchmark datasets demonstrated the efficacy of the proposed method, which exhibits highly competitive performance compared with existing state-of-the-art sequence-based PPIs predictors and should supplement existing PPI prediction methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures.

Abstract

Talk to us

Similar Papers

More From: The Journal of Membrane Biology

Lead the way for us

Journal: The Journal of Membrane Biology	Publication Date: Nov 12, 2015
Citations: 37

Similar Papers

Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS
Bi-Qing Li ... Yu-Dong Cai
PLoS ONE | VOL. 7
Bi-Qing Li, et. al.Bi-Qing Li ... Yu-Dong Cai
28 Aug 2012
PLoS ONE | VOL. 7

Prediction of Protein-Protein Interaction Sites by Multifeature Fusion and RF with mRMR and IFS.
Junyan Zhang ... Zhiqiang Ma
Disease markers | VOL. 2022
Junyan Zhang, et. al.Junyan Zhang ... Zhiqiang Ma
04 Oct 2022
Disease markers | VOL. 2022

Combining deep graph convolutional networks and PRSA to enhance protein-protein interaction site prediction
Zhouhan Li ... Jing Peng
-
Zhouhan Li, et. al.Zhouhan Li ... Jing Peng
09 Oct 2022
09 Oct 2022

Prediction of protein-protein interaction sites by means of ensemble learning and weighted feature descriptor.
Xiuquan Du ... Junfeng Xia
Journal of biological research (Thessalonike, Greece) | VOL. 23
Xiuquan Du, et. al.Xiuquan Du ... Junfeng Xia
01 May 2016
Journal of biological research (Thessalonike, Greece) | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures.

Abstract

Talk to us

Similar Papers

More From: The Journal of Membrane Biology