Using beta binomials to estimate classification uncertainty for ensemble models.

Robert D Clark,Wenkel Liang,Robert Fraczkiewicz,Marvin Waldman,Adam C Lee,Michael S Lawless

doi:10.1186/1758-2946-6-34

Robert D Clark, Wenkel Liang + Show 4 more

Open Access

https://doi.org/10.1186/1758-2946-6-34

Copy DOI

Journal: Journal of Cheminformatics	Publication Date: Jun 22, 2014
Citations: 43	License type: CC BY 4.0

Affiliation: Simulations Plus (United States)

Abstract

BackgroundQuantitative structure-activity (QSAR) models have enormous potential for reducing drug discovery and development costs as well as the need for animal testing. Great strides have been made in estimating their overall reliability, but to fully realize that potential, researchers and regulators need to know how confident they can be in individual predictions.ResultsSubmodels in an ensemble model which have been trained on different subsets of a shared training pool represent multiple samples of the model space, and the degree of agreement among them contains information on the reliability of ensemble predictions. For artificial neural network ensembles (ANNEs) using two different methods for determining ensemble classification – one using vote tallies and the other averaging individual network outputs – we have found that the distribution of predictions across positive vote tallies can be reasonably well-modeled as a beta binomial distribution, as can the distribution of errors. Together, these two distributions can be used to estimate the probability that a given predictive classification will be in error. Large data sets comprised of logP, Ames mutagenicity, and CYP2D6 inhibition data are used to illustrate and validate the method. The distributions of predictions and errors for the training pool accurately predicted the distribution of predictions and errors for large external validation sets, even when the number of positive and negative examples in the training pool were not balanced. Moreover, the likelihood of a given compound being prospectively misclassified as a function of the degree of consensus between networks in the ensemble could in most cases be estimated accurately from the fitted beta binomial distributions for the training pool.ConclusionsConfidence in an individual predictive classification by an ensemble model can be accurately assessed by examining the distributions of predictions and errors as a function of the degree of agreement among the constituent submodels. Further, ensemble uncertainty estimation can often be improved by adjusting the voting or classification threshold based on the parameters of the error distribution. Finally, the profiles for models whose predictive uncertainty estimates are not reliable provide clues to that effect without the need for comparison to an external test set.

Highlights

Quantitative structure-activity (QSAR) models have enormous potential for reducing drug discovery and development costs as well as the need for animal testing
We investigated how predictive classification error rates relate to the number of positive “votes” obtained from an ensemble model composed of artificial neural networks [16] – e.g., classification artificial neural network ensembles (ANNEs) generated by the ADMET ModelerTM module of ADMET PredictorTM
The mean of the beta binomial distribution fitted to the training pool for logP3-3a is 25.5, which is 0.772 on a per network basis; we have found that shifting the ensemble classification threshold to that value generally constitutes an overcorrection

Summary

Introduction

Quantitative structure-activity (QSAR) models have enormous potential for reducing drug discovery and development costs as well as the need for animal testing. Great strides have been made in estimating their overall reliability, but to fully realize that potential, researchers and regulators need to know how confident they can be in individual predictions. Considerable progress has been made in recent years on assessing the overall predictive reliability of QSAR models, but research and regulatory applications both require good ways to estimate the accuracy of individual predictions. Considerable work has been done on ways to identify compounds for which predictions are unlikely to be reliable – i.e., on applicability domains [4,5,6] and on quantitative estimations of uncertainty for regression models [4,7,8,9,10,11,12,13]. The degree of ensemble consensus in terms of votes has not been utilized to make quantitative estimates of predictive classification uncertainty for individual predictions

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Using beta binomials to estimate classification uncertainty for ensemble models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Predictive analytics of bridge safety for intelligent transportation system using ensemble model
S Vasavi ... A Anu Gokhale
Materials Today: Proceedings | VOL. 45
S Vasavi, et. al.S Vasavi ... A Anu Gokhale
01 Jan 2020
Materials Today: Proceedings | VOL. 45

A formal likelihood function for parameter and predictive inference of hydrologic models with correlated, heteroscedastic, and non‐Gaussian errors
Gerrit Schoups ... Jasper A Vrugt
Water Resources Research | VOL. 46
Gerrit Schoups, et. al.Gerrit Schoups ... Jasper A Vrugt
01 Oct 2010
Water Resources Research | VOL. 46

High performance concrete compressive strength forecasting using ensemble models based on discrete wavelet transform
Halil Ibrahim Erdal ... Ersin Namli
Engineering Applications of Artificial Intelligence | VOL. 26
Halil Ibrahim Erdal, et. al.Halil Ibrahim Erdal ... Ersin Namli
27 Nov 2012
Engineering Applications of Artificial Intelligence | VOL. 26

An ensemble of artificial neural network models to forecast hourly energy demand
Andrea Manno ... Dario Rando
Optimization and Engineering | VOL. -
Andrea Manno, et. al.Andrea Manno ... Dario Rando
25 Mar 2024
Optimization and Engineering | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using beta binomials to estimate classification uncertainty for ensemble models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics