In this work, we present a multi-modal machine learning method to automate early glaucoma diagnosis. The proposed methodology introduces two novel aspects for automated diagnosis not previously explored in the literature: simultaneous use of ocular fundus images from both eyes and integration with the patient's additional clinical data. We begin by establishing a baseline, termed monocular mode, which adheres to the traditional approach of considering the data from each eye as a separate instance. We then explore the binocular mode, investigating how combining information from both eyes of the same patient can enhance glaucoma diagnosis accuracy. This exploration employs the PAPILA dataset, comprising information from both eyes, clinical data, ocular fundus images, and expert segmentation of these images. Additionally, we compare two image-derived data modalities: direct ocular fundus images and morphological data from manual expert segmentation. Our method integrates Gradient-Boosted Decision Trees (GBDT) and Convolutional Neural Networks (CNN), specifically focusing on the MobileNet, VGG16, ResNet-50, and Inception models. SHAP values are used to interpret GBDT models, while the Deep Explainer method is applied in conjunction with SHAP to analyze the outputs of convolutional-based models. Our findings show the viability of considering both eyes, which improves the model performance. The binocular approach, incorporating information from morphological and clinical data yielded an AUC of 0.796 (±0.003 at a 95% confidence interval), while the CNN, using the same approach (both eyes), achieved an AUC of 0.764 (±0.005 at a 95% confidence interval).
Read full abstract