Abstract

This paper argues that machine learning (ML) and epidemiology are on collision course over causation. The discipline of epidemiology lays great emphasis on causation, while ML research does not. Some epidemiologists have proposed imposing what amounts to a causal constraint on ML in epidemiology, requiring it either to engage in causal inference or restrict itself to mere projection. We whittle down the issues to the question of whether causal knowledge is necessary for underwriting predictions about the outcomes of public health interventions. While there is great plausibility to the idea that it is, conviction that something is impossible does not by itself motivate a constraint to forbid trying. We disambiguate the possible motivations for such a constraint into definitional, metaphysical, epistemological, and pragmatic considerations and argue that “Proceed with caution” (rather than “Stop!”) is the outcome of each. We then argue that there are positive reasons to proceed, albeit cautiously. Causal inference enforces existing classification schema prior to the testing of associational claims (causal or otherwise), but associations and classification schema are more plausibly discovered (rather than tested or justified) in a back-and-forth process of gaining reflective equilibrium. ML instantiates this kind of process, we argue, and thus offers the welcome prospect of uncovering meaningful new concepts in epidemiology and public health—provided it is not causally constrained.

Highlights

  • Most research in medical machine learning (ML) focuses on the clinical context: for example, using deep learning–based computer vision for diagnostic purposes

  • We consider whether and how the central role of causal thinking in epidemiology can be squared with the almost casual approach that ML sometimes appears to adopt towards causation. (When we talk about “ML” in this paper, we especially refer to deep learning techniques.) We argue that causation is the biggest conceptual stumbling block to epidemiological ML, and one of the reasons that its uptake by, or application to, epidemiology has not yet been significant

  • In a 2019 paper in the International Journal of Epidemiology, Tony Blakely and co-authors begin by remarking that “In epidemiology, prediction and causal inference are usually considered as different worlds,” with machine learning located in the prediction world (Blakely et al, 2019, 1)

Read more

Summary

Introduction

Most research in medical machine learning (ML) focuses on the clinical context: for example, using deep learning–based computer vision for diagnostic purposes (see Esteva et al, 2021 for review). This would be even more surprising, given that the goal of epidemiological research is to inform public health interventions, and predicting the outcome of an intervention is generally thought to require causal knowledge. It is this third option that we favour in this paper. The former confines ML to supportive tasks in a larger causal inquiry, framed by epidemiological causal inference methodology.

Merely Computational Uses of ML
14 Page 4 of 22
ML as Investigative Process
14 Page 6 of 22
Over‐confidence
External Validity
Opacity
14 Page 10 of 22
The Case for a Causal Constraint
Causal Knowledge and Interventions
Definitional Motivation
Metaphysical Motivation
14 Page 14 of 22
Epistemological Motivation
Pragmatic Motivation
Epidemiology and Public Health Need New Concepts
14 Page 16 of 22
Association and Classification
Robo‐epidemiology May Provide New Concepts
14 Page 18 of 22
Conclusion
14 Page 20 of 22
Findings
14 Page 22 of 22
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call