Abstract

Computational prediction of the interaction between drugs and targets is a standing challenge in the field of drug discovery. A number of rather accurate predictions were reported for various binary drug–target benchmark datasets. However, a notable drawback of a binary representation of interaction data is that missing endpoints for non-interacting drug–target pairs are not differentiated from inactive cases, and that predicted levels of activity depend on pre-defined binarization thresholds. In this paper, we present a method called SimBoost that predicts continuous (non-binary) values of binding affinities of compounds and proteins and thus incorporates the whole interaction spectrum from true negative to true positive interactions. Additionally, we propose a version of the method called SimBoostQuant which computes a prediction interval in order to assess the confidence of the predicted affinity, thus defining the Applicability Domain metrics explicitly. We evaluate SimBoost and SimBoostQuant on two established drug–target interaction benchmark datasets and one new dataset that we propose to use as a benchmark for read-across cheminformatics applications. We demonstrate that our methods outperform the previously reported models across the studied datasets.

Highlights

  • Finding a compound that selectively binds to a particular protein is a highly challenging and typically expensive procedure in the drug development process, where more than 90% of candidate compounds fail due to crossreactivity and/or toxicity issues

  • We introduce Matrix Factorization as it was used in the literature for binary drug–target interaction prediction and as it plays an important role in our proposed method

  • We propose a version of SimBoost, called SimBoostQuant, which computes the confidence of the prediction by using quantile regression to learn a prediction interval for a given drug–target pair as a measure of the confidence of the prediction

Read more

Summary

Introduction

Finding a compound that selectively binds to a particular protein is a highly challenging and typically expensive procedure in the drug development process, where more than 90% of candidate compounds fail due to crossreactivity and/or toxicity issues. It is an important topic in drug research to gain knowledge about the interaction of compounds and target proteins through computational methods Such in silico approaches are capable of speeding up the experimental wet lab work by systematically prioritizing the most potent compounds and help predicting their potential side effects. The datasets commonly used for the training and evaluation of such machine learning-based prediction methods are the Enzymes, Ion Channels, Nuclear Receptor, and G Protein-Coupled Receptor datasets [3] These datasets contain binary labels Y(i,j) = 1 if drug–target pair (di, tj) is known to interact (as shown by wet lab experiments) and Y(i,j) = 0 if either (di, tj) is known to not interact or if the interaction of (di, tj) is unknown. The datasets tend to be biased towards drugs and targets that are considered to be more important or easier

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call