Shortcut learning in medical AI hinders generalization: method for estimating AI model generalization without external data

Cathy Ong Ly,Balagopal Unnikrishnan,Tony Tadic,Tirth Patel,Joe Duhamel,Sonja Kandel,Yasbanoo Moayedi,Michael Brudno,Andrew Hope,Heather Ross,Chris Mcintosh

doi:10.1038/s41746-024-01118-4

Abstract

Healthcare datasets are becoming larger and more complex, necessitating the development of accurate and generalizable AI models for medical applications. Unstructured datasets, including medical imaging, electrocardiograms, and natural language data, are gaining attention with advancements in deep convolutional neural networks and large language models. However, estimating the generalizability of these models to new healthcare settings without extensive validation on external data remains challenging. In experiments across 13 datasets including X-rays, CTs, ECGs, clinical discharge summaries, and lung auscultation data, our results demonstrate that model performance is frequently overestimated by up to 20% on average due to shortcut learning of hidden data acquisition biases (DAB). Shortcut learning refers to a phenomenon in which an AI model learns to solve a task based on spurious correlations present in the data as opposed to features directly related to the task itself. We propose an open source, bias-corrected external accuracy estimate, PEst, that better estimates external accuracy to within 4% on average by measuring and calibrating for DAB-induced shortcut learning.

Full Text