Abstract

Drug target interaction prediction is a very labor-intensive and expensive experimental process which has motivated researchers to focus on in silico prediction to provide information on potential interaction. In recent years, researchers have proposed several computational approaches for predicting new drug target interactions. In this paper, we present CFSBoost, a simple and computationally cheap ensemble boosting classification model for identification and prediction of drug-target interactions using evolutionary and structural features. CFSBoost uses a simple yet novel feature group selection procedure which allows the model to be computationally very cheap while being able to achieve state of the art performance. The ensemble model uses extra tree as weak learners inside a boosting scheme while holding on to the best model per iteration. We tested our method of four benchmark datasets, which are also referred as gold standard datasets. Our method was able to achieve better score in terms of area under receiver operating characteristic (auROC) curve on 2 out of the 4 datasets. It was also able to achieve higher area under precision recall (auPR) curve on 3 out of the 4 datasets. It has been argued by researchers that auPR metric is more suitable than auROC for comparison of performance on imbalanced datasets such our benchmark datasets. Our reported result shows that, despite of its simplicity in design, CFSBoost’s performance is very satisfactory comparing to other literatures. We also provide 5 new possible interactions for each dataset based on CFSBoost’s prediction score.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call