Setting decision thresholds when operating conditions are uncertain

Cèsar Ferri,José Hernández-Orallo,Peter Flach

doi:10.1007/s10618-019-00613-7

Cèsar Ferri, José Hernández-Orallo + Show 1 more

Open Access

https://doi.org/10.1007/s10618-019-00613-7

Copy DOI

Abstract

The quality of the decisions made by a machine learning model depends on the data and the operating conditions during deployment. Often, operating conditions such as class distribution and misclassification costs have changed during the time since the model was trained and evaluated. When deploying a binary classifier that outputs scores, once we know the new class distribution and the new cost ratio between false positives and false negatives, there are several methods in the literature to help us choose an appropriate threshold for the classifier’s scores. However, on many occasions, the information that we have about this operating condition is uncertain. Previous work has considered ranges or distributions of operating conditions during deployment, with expected costs being calculated for ranges or intervals, but still the decision for each point is made as if the operating condition were certain. The implications of this assumption have received limited attention: a threshold choice that is best suited without uncertainty may be suboptimal under uncertainty. In this paper we analyse the effect of operating condition uncertainty on the expected loss for different threshold choice methods, both theoretically and experimentally. We model uncertainty as a second conditional distribution over the actual operation condition and study it theoretically in such a way that minimum and maximum uncertainty are both seen as special cases of this general formulation. This is complemented by a thorough experimental analysis investigating how different learning algorithms behave for a range of datasets according to the threshold choice method and the uncertainty level.

Highlights

It is generally recognised in machine learning that optimal decisions depend on an appropriate identification and use of the operating condition surrounding the problem at hand
Using a model of uncertainty based on the Beta distribution, we provide a theoretical analysis, accompanied by graphical illustrations in terms of cost curves, as well as an extensive empirical evaluation, where several threshold choice methods are analysed for varying degrees of uncertainty
Previous work has analysed the expected loss for a range of operating conditions. This previous work was done at the theoretical level for three threshold choice methods assuming that the given operating condition c is perfect

Summary

Introduction

It is generally recognised in machine learning that optimal decisions depend on an appropriate identification and use of the operating condition surrounding the problem at hand. The operating condition is usually represented by the class distribution and the costs of misclassification. An undetected fault (false negative) in a production line can be far more critical than a false alarm (false positive) depending on the kind of product that is been manufactured. In this case, the kind of product, the deadline of the order and other factors determine the operating condition. The kind of product, the deadline of the order and other factors determine the operating condition While in general this operating condition can present itself in many ways, in important cases it can be integrated in the utility function or cost function. If we predict the class by taking proper account of the operating condition, better decisions can be made

Objectives

Methods

Findings

Discussion

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Data Mining and Knowledge Discovery	Publication Date: Feb 16, 2019
Citations: 4	License type: open-access

R Discovery Prime

R Discovery Prime

Setting decision thresholds when operating conditions are uncertain

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery

Lead the way for us

Similar Papers

An uncertainty-oriented cost-sensitive credit scoring framework with multi-objective feature selection
Yiqiong Wu ... Yingjie Tian
Electronic commerce research and applications | VOL. 53
Yiqiong Wu, et. al.Yiqiong Wu ... Yingjie Tian
01 May 2022
Electronic commerce research and applications | VOL. 53

Association does not imply prediction: the accuracy of birthweight in predicting child mortality and anthropometric failure
Akshay Swaminathan ... S.V Subramanian
Annals of Epidemiology | VOL. 50
Akshay Swaminathan, et. al.Akshay Swaminathan ... S.V Subramanian
12 Aug 2020
Annals of Epidemiology | VOL. 50

Characteristic Analysis of Fuel Cell Decay Based on Actual Vehicle Operating Conditions
Yang Zhao ... Xinyi Jia
-
Yang Zhao, et. al.Yang Zhao ... Xinyi Jia
28 May 2021
28 May 2021

Friction and Wear Testing of Ion Beam Modified Ceramics for High Temperature Low Heat Rejection Diesel Engines
W Wei ... J Lankford
-
W Wei, et. al.W Wei ... J Lankford
01 Jan 1987
01 Jan 1987

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Setting decision thresholds when operating conditions are uncertain

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery