Values and inductive risk in machine learning modelling: the case of binary classification models

Koray Karaca

doi:10.1007/s13194-021-00405-1

Abstract

I examine the construction and evaluation of machine learning (ML) binary classification models. These models are increasingly used for societal applications such as classifying patients into two categories according to the presence or absence of a certain disease like cancer and heart disease. I argue that the construction of ML (binary) classification models involves an optimisation process aiming at the minimization of the inductive risk associated with the intended uses of these models. I also argue that the construction of these models is underdetermined by the available data, and that this makes it necessary for ML modellers to make social value judgments in determining the error costs (associated with misclassifications) used in ML optimization. I thus suggest that the assessment of the inductive risk with respect to the social values of the intended users is an integral part of the construction and evaluation of ML classification models. I also discuss the implications of this conclusion for the philosophical debate concerning inductive risk.

Highlights

The societal need to extract useful information from large and complex data sets, often referred to as big data, has led to the emergence of big data analytics
Classification accuracy is not an appropriate metric to evaluate the predictive performance of machine learning (ML) classification models with unequal error costs, as it does not take account of unequal costs assigned to FP and FN (Provost et al, 1998)
I have argued that the construction of ML classification models illustrates inductive underdetermination of model construction, in the sense that the methodological choices underlying the construction of these models are underdetermined by the training data, which constitutes the sole empirical evidence for ML model construction

Summary

Introduction

The societal need to extract useful information from large and complex data sets, often referred to as big data, has led to the emergence of big data analytics This is a new field of study encompassing various computational methods that have been offered to cope with the growing complexity of big data analysis. The accuracy of the predictions of ML models depends on how well these models generalize to new data sets beyond those used to construct and test them In this regard, the application of ML models to big data is based on inductive generalization and, as a result, their predictions about new data sets are always prone to error. In the philosophy of science literature, inductive risk has been discussed in relation to the context of theory (or hypothesis or model) acceptance, whereas its relevance to the context of theory (or hypothesis or model) construction has been neglected. I will discuss the implications of this conclusion for the philosophical debate concerning inductive risk

The inductive risk argument and Jeffrey’s counterargument

Inductive risk in the context of model construction

Essential elements and aspects of ML

Supervised ML: an illustrative example

Underdetermination of ML model construction and inductive risk

Cost‐sensitive ML optimisation

Evaluation of ML binary classification models

Algorithmic and epistemic opacity in deep ML

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: European Journal for Philosophy of Science	Publication Date: Oct 26, 2021
Citations: 5	License type: open-access

R Discovery Prime

R Discovery Prime

Values and inductive risk in machine learning modelling: the case of binary classification models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: European Journal for Philosophy of Science

Lead the way for us

Similar Papers

Information-Theoretic Bounds on Quantum Advantage in Machine Learning.
Hsin-Yuan Huang ... John Preskill
Physical Review Letters | VOL. 126
Hsin-Yuan Huang, et. al.Hsin-Yuan Huang ... John Preskill
14 May 2021
Physical Review Letters | VOL. 126

Failure Severity Prediction for Protective-Coating Disbondment via the Classification of Acoustic Emission Signals.
Noor A’In A Rahman ... Zazilah May
Sensors (Basel, Switzerland) | VOL. 23
Noor A’In A Rahman, et. al.Noor A’In A Rahman ... Zazilah May
31 Jul 2023
Sensors (Basel, Switzerland) | VOL. 23

Hyper‐parametric improved machine learning models for solar radiation forecasting
Mantosh Kumar ... Neha Kumari
Concurrency and Computation: Practice and Experience | VOL. 34
Mantosh Kumar, et. al.Mantosh Kumar ... Neha Kumari
26 Jul 2022
Concurrency and Computation: Practice and Experience | VOL. 34

Integrating Artificial Intelligence and UAV-Acquired Multispectral Imagery for the Mapping of Invasive Plant Species in Complex Natural Environments
Narmilan Amarasingam ... Felipe Gonzalez
Remote Sensing | VOL. 16
Narmilan Amarasingam, et. al.Narmilan Amarasingam ... Felipe Gonzalez
29 Apr 2024
Remote Sensing | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Values and inductive risk in machine learning modelling: the case of binary classification models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: European Journal for Philosophy of Science