Protein-Protein Interaction Sites Prediction Based on an Under-Sampling Strategy and Random Forest Algorithm.

Minjie Li,Jun Zhang,Ziheng Wu,Wenyan Wang,Dan Li,Yuming Zhou,Kun Lu,Zhaoquan Chen,Peng Chen,Bing Wang,Shicheng Zheng

doi:10.1109/tcbb.2021.3123269

Abstract

The computational methods of protein-protein interaction sites prediction can effectively avoid the shortcomings of high cost and time in traditional experimental approaches. However, the serious class imbalance between interface and non-interface residues on the protein sequences limits the prediction performance of these methods. This work therefore proposed a new strategy, NearMiss-based under-sampling for unbalancing datasets and Random Forest classification (NM-RF), to predict protein interaction sites. Herein, the residues on protein sequences were represented by the PSSM-derived features, hydropathy index (HI) and relative solvent accessibility (RSA). In order to resolve the class imbalance problem, an under-sampling method based on NearMiss algorithm is adopted to remove some non-interface residues, and then the random forest algorithm is used to perform binary classification on the balanced feature datasets. Experiments show that the accuracy of NM-RF model reaches 87.6% and 84.3% on Dtestset72 and PDBtestset164 respectively, which demonstrate the effectiveness of the proposed NM-RF method in differentiating the interface or non-interface residues.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Computational Biology and Bioinformatics	Publication Date: Nov 1, 2022
Citations: 4	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Protein-Protein Interaction Sites Prediction Based on an Under-Sampling Strategy and Random Forest Algorithm.

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics

Lead the way for us

Similar Papers

Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS
Bi-Qing Li ... Yu-Dong Cai
PLoS ONE | VOL. 7
Bi-Qing Li, et. al.Bi-Qing Li ... Yu-Dong Cai
28 Aug 2012
PLoS ONE | VOL. 7

Prediction of Protein-Protein Interaction Sites by Multifeature Fusion and RF with mRMR and IFS.
Junyan Zhang ... Zhiqiang Ma
Disease markers | VOL. 2022
Junyan Zhang, et. al.Junyan Zhang ... Zhiqiang Ma
04 Oct 2022
Disease markers | VOL. 2022

Combining deep graph convolutional networks and PRSA to enhance protein-protein interaction site prediction
Zhouhan Li ... Jing Peng
-
Zhouhan Li, et. al.Zhouhan Li ... Jing Peng
09 Oct 2022
09 Oct 2022

Identification of Surface Residues Involved in Protein-Protein Interaction — A Support Vector Machine Approach
Changhui Yan ... Vasant Honavar
-
Changhui Yan, et. al.Changhui Yan ... Vasant Honavar
01 Jan 2003
01 Jan 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Protein-Protein Interaction Sites Prediction Based on an Under-Sampling Strategy and Random Forest Algorithm.

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics