Abstract
This paper argues that machine learning (ML) and epidemiology are on collision course over causation. The discipline of epidemiology lays great emphasis on causation, while ML research does not. Some epidemiologists have proposed imposing what amounts to a causal constraint on ML in epidemiology, requiring it either to engage in causal inference or restrict itself to mere projection. We whittle down the issues to the question of whether causal knowledge is necessary for underwriting predictions about the outcomes of public health interventions. While there is great plausibility to the idea that it is, conviction that something is impossible does not by itself motivate a constraint to forbid trying. We disambiguate the possible motivations for such a constraint into definitional, metaphysical, epistemological, and pragmatic considerations and argue that “Proceed with caution” (rather than “Stop!”) is the outcome of each. We then argue that there are positive reasons to proceed, albeit cautiously. Causal inference enforces existing classification schema prior to the testing of associational claims (causal or otherwise), but associations and classification schema are more plausibly discovered (rather than tested or justified) in a back-and-forth process of gaining reflective equilibrium. ML instantiates this kind of process, we argue, and thus offers the welcome prospect of uncovering meaningful new concepts in epidemiology and public health—provided it is not causally constrained.
Highlights
Most research in medical machine learning (ML) focuses on the clinical context: for example, using deep learning–based computer vision for diagnostic purposes
We consider whether and how the central role of causal thinking in epidemiology can be squared with the almost casual approach that ML sometimes appears to adopt towards causation. (When we talk about “ML” in this paper, we especially refer to deep learning techniques.) We argue that causation is the biggest conceptual stumbling block to epidemiological ML, and one of the reasons that its uptake by, or application to, epidemiology has not yet been significant
In a 2019 paper in the International Journal of Epidemiology, Tony Blakely and co-authors begin by remarking that “In epidemiology, prediction and causal inference are usually considered as different worlds,” with machine learning located in the prediction world (Blakely et al, 2019, 1)
Summary
Most research in medical machine learning (ML) focuses on the clinical context: for example, using deep learning–based computer vision for diagnostic purposes (see Esteva et al, 2021 for review). This would be even more surprising, given that the goal of epidemiological research is to inform public health interventions, and predicting the outcome of an intervention is generally thought to require causal knowledge. It is this third option that we favour in this paper. The former confines ML to supportive tasks in a larger causal inquiry, framed by epidemiological causal inference methodology.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have