Quantifying model errors using similarity to training data

Rob D Brown,Jd Honeycutt,Sl Aaron

doi:10.1186/1758-2946-2-s1-o7

Abstract

When making a prediction with a statistical model, it is not sufficient to know that the model is good, in the sense that it is able to make accurate predictions on test data. Another relevant question is: How good is the model for a specific sample whose properties we wish to predict? Stated another way: Is the sample within or outside the model's domain of applicability or what is the degree to which a test compound is within the model's domain of applicability. Numerous studies have been done on determining appropriate measures to address this question [1-4]. Here we focus on a derivative question: Can we determine an applicability domain measure suitable for deriving quantitative error bars -- that is, error bars which accurately reflect the expected error when making predictions for specified values of the domain measure? Such a measure could then be used to provide an indication of the confidence in a given prediction (i.e. the likely error in a prediction based on to what degree the test compound is part of the model's domain of applicability).Ideally, we wish such a measure to be simple to calculate and to understand, to apply to models of all types -- including classification and regression models for both molecular and non-molecular data - and to be free of adjustable parameters. Consistent with recent work by others [5,6], the measures we have seen that best meet these criteria are distances to individual samples in the training data. We describe our attempts to construct a recipe for deriving quantitative error bars from these distances.

Highlights

When making a prediction with a statistical model, it is not sufficient to know that the model is “good”, in the sense that it is able to make accurate predictions on test data
Another relevant question is: How good is the model for a specific sample whose properties we wish to predict? Stated another way: Is the sample within or outside the model’s domain of applicability or what is the degree to which a test compound is within the model’s domain of applicability
Numerous studies have been done on determining appropriate measures to address this question [1-4]

Summary

Introduction

When making a prediction with a statistical model, it is not sufficient to know that the model is “good”, in the sense that it is able to make accurate predictions on test data. Another relevant question is: How good is the model for a specific sample whose properties we wish to predict?

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: May 1, 2010
Citations: 7	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Quantifying model errors using similarity to training data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Uncertainty quantification: Can we trust artificial intelligence in drug discovery?
Jie Yu ... Mingyue Zheng
iScience | VOL. 25
Jie Yu, et. al.Jie Yu ... Mingyue Zheng
21 Jul 2022
iScience | VOL. 25

Domain modeling for software engineering
...
-
, et. al. ...
01 May 1991
01 May 1991

Domain modeling for software engineering
N Iscoe ... G Arango
-
N Iscoe, et. al.N Iscoe ... G Arango
10 Dec 2002
10 Dec 2002

A Measure of Domain of Applicability for QSAR Modelling Based on Intelligent K‐Means Clustering
Robert W Stanforth ... Boris Mirkin
QSAR & Combinatorial Science | VOL. 26
Robert W Stanforth, et. al.Robert W Stanforth ... Boris Mirkin
01 Jul 2007
QSAR & Combinatorial Science | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Quantifying model errors using similarity to training data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics