Abstract

Backgroud: The prediction of drug–target interactions (DTIs) is of great significance in drug development. It is time-consuming and expensive in traditional experimental methods. Machine learning can reduce the cost of prediction and is limited by the characteristics of imbalanced datasets and problems of essential feature selection. Methods: The prediction method based on the Ensemble model of Multiple Feature Pairs (Ensemble-MFP) is introduced. Firstly, three negative sets are generated according to the Euclidean distance of three feature pairs. Then, the negative samples of the validation set/test set are randomly selected from the union set of the three negative sets in the validation set/test set. At the same time, the ensemble model with weight is optimized and applied to the test set. Results: The area under the receiver operating characteristic curve (area under ROC, AUC) in three out of four sub-datasets in gold standard datasets was more than 94.0% in the prediction of new drugs. The effectiveness of the proposed method is also shown with the comparison of state-of-the-art methods and demonstration of predicted drug–target pairs. Conclusion: The Ensemble-MFP can weigh the existing feature pairs and has a good prediction effect for general prediction on new drugs.

Highlights

  • The prediction of drug–target Interaction (DTI) based on machine learning is very important in pharmacology and drug design [1,2,3]

  • An algorithm based on the Ensemble models of Multiple Feature Pairs (Ensemble-MFP) is proposed for drug–target interaction predictions

  • Three models are obtained through three feature pairs, and the weights of the models are optimized on the validation set and applied on the test set

Read more

Summary

Introduction

The prediction of drug–target Interaction (DTI) based on machine learning is very important in pharmacology and drug design [1,2,3]. It can be considered as one direction in chemogenomics, which is a new interdisciplinary subject of biology, chemistry and informatics [4,5]. The datasets of DTIs cannot provide enough reliable positive and negative samples for machine learning to obtain stable models [20]. To solve this problem, extraction methods of negative samples were studied.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call