AbstractA plethora of classification models, and especially convolutional neural networks (CNN), have been proposed for the detection of glaucoma from fundus images. Often trained with data from a single glaucoma clinic, they report impressive performance on internal test sets, but (I) lack in objective insights into the decision making process, and (II) rarely generalize well to external sets.Opening the black‐box: need for explainable AIData‐driven, feature‐agnostic CNN models have shown to outperform classical machine learning approaches that rely on manually curated image features. But this comes at the expense of explainability, a concept introduced to delineate the insights into the decision process of the algorithm. Explainable AI is particularly relevant in the field of medicine, as regularity bodies require understandable decisions, in order to gain trust from both clinicians and patients. The largest area of research in explainable AI proposes the framework of saliency or heat maps, i.e. visualizations of image regions that are deemed the most important to the trained model. However, these techniques are facing increased scrutiny over the years, as some studies indicated that most saliency mapping methods perform no better than random baselines.In our recent work, we used various fundus image cropping policies to investigate the salient regions for glaucoma detection. Our findings provide the first irrefutable evidence that deep learning can detect glaucoma from colour fundus image regions outside the ONH.Generalizability and domain shiftsThe ability to generalize, generalizability, to unseen data is of crucial importance in building clinically useful deep learning models. This can be defined on two levels. First, a model is deemed useful when it performs well on unseen data from the same domain as the training data. More concretely, this would mean that a model trained on glaucoma data from one glaucoma center works well on a holdout set of images from the same clinic. Next to internal testing, it is critical to have models that scale well to imaging data from external sources, such as a glaucoma clinic from a different city. This transition introduces a domain shift, an event where the data distribution changes. This shift can be as small as an undetectable different image colour histogram due to other lighting conditions at the external center.Typically, domain shifts are more marked between ophthalmic centers. The usual suspects include the following non‐exhaustive list: change in camera manufacturer, disease prevalence, and demographic characteristics such as ethnicity and age. A tool that would be suited for glaucoma screening should be able to adapt to a much lower glaucoma prevalence, when initially trained on a data set obtained from a referral center. We demonstrate that a deep learning approach backed by regression loss seems more robust to data shifts than a traditional binary classification system for glaucoma screening from fundus images. Excellent results were obtained on 12 different data sets, with close to 150 000 images evaluated.
Read full abstract