To develop a deep learning approach based on deep residual neural network (ResNet101) for the automated detection of glaucomatous optic neuropathy (GON) using color fundus images, understand the process by which the model makes predictions, and explore the effect of the integration of fundus images and the medical history data from patients. A total of 34,279 fundus images and the corresponding medical history data were retrospectively collected from cohorts of 2371 adult patients, and these images were labeled by 8 glaucoma experts, in which 26,585 fundus images (12,618 images with GON-confirmed eyes, 1114 images with GON-suspected eyes, and 12,853 NORMAL eye images) were included. We adopted 10-fold cross-validation strategy to train and optimize our model. This model was tested in an independent testing dataset consisting of 3481 images (1524 images from NORMAL eyes, 1442 images from GON-confirmed eyes, and 515 images from GON-suspected eyes) from 249 patients. Moreover, the performance of the best model was compared with results obtained by two experts. Accuracy, sensitivity, specificity, kappa value, and area under receiver operating characteristic (AUC) were calculated. Further, we performed qualitative evaluation of model predictions and occlusion testing. Finally, we assessed the effect of integrating medical history data in the final classification. In a multiclass comparison between GON-confirmed eyes, GON-suspected eyes and NORMAL eyes, our model achieved 0.941 (95% confidence interval [CI], 0.936-0.946) accuracy, 0.957 (95% CI, 0.953-0.961) sensitivity, and 0.929 (95% CI, 0.923-0.935) specificity. The AUC distinguishing referrals (GON-confirmed and GON-suspected eyes) from observation was 0.992 (95% CI, 0.991-0.993). Our best model had a kappa value of 0.927, while the two experts' kappa values were 0.928 and 0.925 independently. The best 2 binary classifiers distinguishing GON-confirmed/GON-suspected eyes from NORMAL eyes obtained 0.955, 0.965 accuracy, 0.977, 0.998 sensitivity, and 0.929, 0.954 specificity, while the AUC was 0.992, 0.999 respectively. Additionally, the occlusion testing showed that our model identified the neuroretinal rim region, retinal nerve fiber layer (RNFL) defect areas (superior or inferior) as the most important parts for the discrimination of GON, which evaluated fundus images in a way similar to clinicians. Finally, the results of integration of fundus images with medical history data showed a slight improvement in sensitivity and specificity with similar AUCs. This approach could discriminate GON with high accuracy, sensitivity, specificity, and AUC using color fundus photographs. It may provide a second opinion on the diagnosis of glaucoma to the specialist quickly, efficiently and at low cost, and assist doctors and the public in large-scale screening for glaucoma.
Read full abstract