Assessing the calibration in toxicological in vitro models with conformal prediction

Andrea Morger,Niharika Gauraha,Andrea Volkamer,Fredrik Svensson,Ola Spjuth,Ulf Norinder,Staffan Arvidsson Mcshane

doi:10.1186/s13321-021-00511-5

Abstract

Machine learning methods are widely used in drug discovery and toxicity prediction. While showing overall good performance in cross-validation studies, their predictive power (often) drops in cases where the query samples have drifted from the training data’s descriptor space. Thus, the assumption for applying machine learning algorithms, that training and test data stem from the same distribution, might not always be fulfilled. In this work, conformal prediction is used to assess the calibration of the models. Deviations from the expected error may indicate that training and test data originate from different distributions. Exemplified on the Tox21 datasets, composed of chronologically released Tox21Train, Tox21Test and Tox21Score subsets, we observed that while internally valid models could be trained using cross-validation on Tox21Train, predictions on the external Tox21Score data resulted in higher error rates than expected. To improve the prediction on the external sets, a strategy exchanging the calibration set with more recent data, such as Tox21Test, has successfully been introduced. We conclude that conformal prediction can be used to diagnose data drifts and other issues related to model calibration. The proposed improvement strategy—exchanging the calibration data only—is convenient as it does not require retraining of the underlying model.

Highlights

Machine learning (ML) methods are ubiquitous in drug discovery and toxicity prediction [1, 2]
In this work, the potential of conformal prediction (CP) to diagnose data drifts in toxicity datasets was investigated on the Tox21 data
Figure S3. pred_test: aggregated conformal prediction (ACP) models were trained and calibrated on Tox21Train and predictions were made for Tox21Test

Summary

Introduction

Machine learning (ML) methods are ubiquitous in drug discovery and toxicity prediction [1, 2]. With more high-quality standardised data available, the (potential) impact of ML methods in regulatory toxicology is growing [4]. The collection of available toxicity data is increasing, thanks in part to high-throughput screening programs such as ToxCast [5] and Tox21 [6, 7], and with public-private partnerships such as the eTOX and eTRANSAFE projects, which focus on the sharing of (confidential) toxicity. Several groups investigated random vs rational selection of optimal test/training sets, e.g. using cluster- or activity-based splits, with the goal of better reflecting the true predictive power of established models [10,11,12,13,14]. Martin et al [11] showed that rational selection of training and test sets—compared to random splits— generated better statistical results on the (internal) test sets. The performance of both types of regression models on the—artificially created—external evaluation set was comparable

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: Apr 29, 2021
Citations: 14	License type: open-access

R Discovery Prime

R Discovery Prime

Assessing the calibration in toxicological in vitro models with conformal prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Artificial intelligence in interdisciplinary life science and drug discovery research.
Jürgen Bajorath
Future science OA | VOL. 8
Jürgen BajorathJürgen Bajorath
08 Mar 2022
Future science OA | VOL. 8

Detection of Phishing Webpages Using Heterogeneous Transfer Learning
Karl R Weiss ... Taghi M Khoshgoftaar
-
Karl R Weiss, et. al.Karl R Weiss ... Taghi M Khoshgoftaar
01 Oct 2017
01 Oct 2017

Uncertainty quantification: Can we trust artificial intelligence in drug discovery?
Jie Yu ... Mingyue Zheng
iScience | VOL. 25
Jie Yu, et. al.Jie Yu ... Mingyue Zheng
21 Jul 2022
iScience | VOL. 25

Pushing the limits of solubility prediction via quality-oriented data selection.
Murat Cihan Sorkun ... Süleyman Er
iScience | VOL. 24
Murat Cihan Sorkun, et. al.Murat Cihan Sorkun ... Süleyman Er
17 Dec 2020
iScience | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assessing the calibration in toxicological in vitro models with conformal prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics