DPRESS: Localizing estimates of predictive uncertainty.

Robert D Clark

doi:10.1186/1758-2946-1-11

Abstract

BackgroundThe need to have a quantitative estimate of the uncertainty of prediction for QSAR models is steadily increasing, in part because such predictions are being widely distributed as tabulated values disconnected from the models used to generate them. Classical statistical theory assumes that the error in the population being modeled is independent and identically distributed (IID), but this is often not actually the case. Such inhomogeneous error (heteroskedasticity) can be addressed by providing an individualized estimate of predictive uncertainty for each particular new object u: the standard error of prediction su can be estimated as the non-cross-validated error st* for the closest object t* in the training set adjusted for its separation d from u in the descriptor space relative to the size of the training set.The predictive uncertainty factor γt* is obtained by distributing the internal predictive error sum of squares across objects in the training set based on the distances between them, hence the acronym: Distributed PRedictive Error Sum of Squares (DPRESS). Note that st* and γt*are characteristic of each training set compound contributing to the model of interest.ResultsThe method was applied to partial least-squares models built using 2D (molecular hologram) or 3D (molecular field) descriptors applied to mid-sized training sets (N = 75) drawn from a large (N = 304), well-characterized pool of cyclooxygenase inhibitors. The observed variation in predictive error for the external 229 compound test sets was compared with the uncertainty estimates from DPRESS. Good qualitative and quantitative agreement was seen between the distributions of predictive error observed and those predicted using DPRESS. Inclusion of the distance-dependent term was essential to getting good agreement between the estimated uncertainties and the observed distributions of predictive error. The uncertainty estimates derived by DPRESS were conservative even when the training set was biased, but not excessively so.ConclusionDPRESS is a straightforward and powerful way to reliably estimate individual predictive uncertainties for compounds outside the training set based on their distance to the training set and the internal predictive uncertainty associated with its nearest neighbor in that set. It represents a sample-based, a posteriori approach to defining applicability domains in terms of localized uncertainty.

Highlights

The need to have a quantitative estimate of the uncertainty of prediction for quantitative structure-activity relationships (QSAR) models is steadily increasing, in part because such predictions are being widely distributed as tabulated values disconnected from the models used to generate them
The method was applied to partial least-squares models built using 2D or 3D descriptors applied to mid-sized training sets (N = 75) drawn from a large (N = 304), well-characterized pool of cyclooxygenase inhibitors
The suitability of Distributed PRedictive Error Sum of Squares (DPRESS) or any other quantitative model of predictive uncertainty is best evaluated by applying it to experimental QSAR data sets

Summary

Introduction

The need to have a quantitative estimate of the uncertainty of prediction for QSAR models is steadily increasing, in part because such predictions are being widely distributed as tabulated values disconnected from the models used to generate them. Classical statistical theory assumes that the error in the population being modeled is independent and identically distributed (IID), but this is often not the case Such inhomogeneous error (heteroskedasticity) can be addressed by providing an individualized estimate of predictive uncertainty for each particular new object u: the standard error of prediction su can be estimated as the non-cross-validated error st* for the closest object t* in the training set adjusted for its separation d from u in the descriptor space relative to the size of the training set. The focus for pharmaceutical drug discovery subsequently shifted from in vivo testing to in vitro evaluation of interactions between candidate ligands and isolated enzymes or receptors This change brought with it a shift of descriptors from measurable properties of compounds to computationally estimated properties of molecules, with the calculations in question often being based on (sub)structural descriptors. Questions related to validity of the model as a whole took center stage as the number of descriptors available proliferated [5,6], followed closely by a strong interest in predictivity and how best to establish applicability domains [7,8,9,10,11,12,13,14,15]

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of cheminformatics	Publication Date: Jul 14, 2009
Citations: 22	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

DPRESS: Localizing estimates of predictive uncertainty.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of cheminformatics

Lead the way for us

Similar Papers

Uncertainty quantification: Can we trust artificial intelligence in drug discovery?
Jie Yu ... Mingyue Zheng
iScience | VOL. 25
Jie Yu, et. al.Jie Yu ... Mingyue Zheng
21 Jul 2022
iScience | VOL. 25

Multivariate Calibration Applied to the Simultaneous Spectrophotometric Determination of Ascorbic Acid, Tyrosine and Epinephrine in Pharmaceutical Formulation and Biological Fluids
Aida Solhjoo ... Habibollah Khajehsharifi
Current Analytical Chemistry | VOL. 12
Aida Solhjoo, et. al.Aida Solhjoo ... Habibollah Khajehsharifi
27 Oct 2016
Current Analytical Chemistry | VOL. 12

Map Generation in High-Value Horticultural Integrated Pest Management: Appropriate Interpolation Methods for Site-Specific Pest Management of Colorado Potato Beetle (Coleoptera: Chrysomelidae)
Randall Weisz ... Zane Smilowitz
Journal of Economic Entomology | VOL. 88
Randall Weisz, et. al.Randall Weisz ... Zane Smilowitz
01 Dec 1995
Journal of Economic Entomology | VOL. 88

Distribution of the biased hypothesis sum of squares in linear models with missing observations
Anant M Kshirsagar ... Sheela Deo
Communications in Statistics - Theory and Methods | VOL. 18
Anant M Kshirsagar, et. al.Anant M Kshirsagar ... Sheela Deo
01 Jan 1989
Communications in Statistics - Theory and Methods | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DPRESS: Localizing estimates of predictive uncertainty.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of cheminformatics