Abstract

One of the challenges with predictive modeling is how to quantify the reliability of the models' predictions on new objects. In this work we give an introduction to conformal prediction, a framework that sits on top of traditional machine learning algorithms and which outputs valid confidence estimates to predictions from QSAR models in the form of prediction intervals that are specific to each predicted object. For regression, a prediction interval consists of an upper and a lower bound. For classification, a prediction interval is a set that contains none, one, or many of the potential classes. The size of the prediction interval is affected by a user-specified confidence/significance level, and by the nonconformity of the predicted object; i.e., the strangeness as defined by a nonconformity function. Conformal prediction provides a rigorous and mathematically proven framework for in silico modeling with guarantees on error rates as well as a consistent handling of the models’ applicability domain intrinsically linked to the underlying machine learning model. Apart from introducing the concepts and types of conformal prediction, we also provide an example application for modeling ABC transporters using conformal prediction, as well as a discussion on general implications for drug discovery.

Highlights

  • Prediction of different endpoints based on chemical structure constitutes an important problem in drug discovery projects

  • QSAR is a ligand-based method which often relies on machine learning algorithms for making predictions, and a key challenge when constructing and using these types of models is the concept of confidence in predictions; i.e., how much can you trust the predictions made by this approach on a novel compound that has never been tested or sometimes not even synthesized?

  • In QSAR, the predictions made by the conformal predictor already take into account the strangeness of a new compound compared to training data, delivering an alternative to the concept of applicability domain that is commonly used within this field.[2]

Read more

Summary

Introduction

Prediction of different endpoints based on chemical structure constitutes an important problem in drug discovery projects. QSAR predictions can support decision making in drug discovery, such as prioritizing between compounds and experiments.[1] QSAR is a ligand-based method which often relies on machine learning algorithms for making predictions, and a key challenge when constructing and using these types of models is the concept of confidence in predictions; i.e., how much can you trust the predictions made by this approach on a novel compound that has never been tested or sometimes not even synthesized?. Conformal prediction adds several benefits to predictive modeling, mainly by assigning a valid measure of the confidence in predictions that is specific to the predicted object. The rest of this article is organised as follows: first in (2) we give some background on conformal prediction in QSAR and introduce the concepts of validity and efficiency, in (3) we describe general applications of conformal prediction in drug discovery, in (4) we present a case study on ABC transporters, in (5) we describe different approaches to conformal prediction and conclude in (6) with discussing the implications of using conformal prediction in drug discovery

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call