NegStacking: Drug-Target Interaction Prediction Based on Ensemble Learning and Logistic Regression.

Jie Yang,Song He,Xiaochen Bo,Zhongnan Zhang

doi:10.1109/tcbb.2020.2968025

Abstract

Drug-target interactions (DTIs) identification is an important issue of drug research, and many methods proposed to predict potential DTIs based on machine learning treat it as a binary classification problem. However, the number of known interacting drug-target pairs (positive samples) is far less than that of non-interacting pairs (negative samples). Most methods do not utilize these large numbers of negative samples sufficiently, which limits their prediction performance. To address this problem, we proposed a stacking framework named NegStacking. First, it uses sampling to obtain multiple completely different negative sample sets. Then, each weak learner is trained with a different negative sample set and the same positive sample set, and the logistic regression (LR) is used as a meta-learner to adaptively combine these weak learners. Moreover, in the training process, feature subspacing and hyperparameter perturbation are applied to increase ensemble diversity. Finally, the trained model could be used to predict new samples. We compared NegStacking with other methods, and the experimental results show that our model is superior. NegStacking can improve the performance of predictive DTIs, and it has broad application prospects for improving the drug discovery process. The source code and datasets are available at https://github.com/Open-ss/NegStacking.

Full Text