Abstract

Prediction of protein–protein interaction sites plays an important role for understanding the protein interactions and functions. However, in the protein–protein interaction site prediction problem, the number of binding-site residues is usually much less than that of other amino acid residues in a protein chain, which would lead to the performance drop of standard machine learning methods on minority class, i.e., the binding-site residues. Therefore, to improve the prediction performance on binding-site residues, we propose in this paper a new boosting algorithm (SSTBoost) that consists of stochastic sensitivity measure-based undersampling method and AdaBoost algorithm. Stochastic sensitivity measure-based undersampling method aims to re-balance the dataset by selecting those samples with the highest probability to be incorrectly labeled, and AdaBoost algorithm aims to improve the performance of base hypotheses by making them to be complementary and be conjunction with each other. Twenty UCI datasets are first used to evaluate the robustness and effectiveness of the SSTBoost. After that, the SSTBoost is tested on twenty-two practical protein–protein interaction sites prediction problems. Experimental results show that the SSTBoost significantly improves the performances against state-of-the-art methods by <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\text{57.3}\%$</tex-math></inline-formula> , <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\text{88.2}\%$</tex-math></inline-formula> , and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\text{78.2}\%$</tex-math></inline-formula> out of 110 cases in terms of Recall, F-score, and G-mean, respectively. This shows its potential to handle other bioinformatic applications in near future.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call