Abstract

Exciting applications of deep learning have been made recently in the field of ophthalmology. In particular, several algorithms have shown remarkable success in classifying common diseases such as age-related macular degeneration and diabetic retinopathy as well as rare pathologic features.1Ting D.S.W. Cheung C.Y.-L. Lim G. et al.Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes.JAMA. 2017; 318: 2211-2223Crossref PubMed Scopus (880) Google Scholar, 2Kihara Y. Heeren T.F.C. Lee C.S. et al.Estimating retinal sensitivity using optical coherence tomography with deep-learning algorithms in macular telangiectasia type 2.JAMA Network Open. 2019; 2e188029Crossref PubMed Scopus (37) Google Scholar In the current issue of Ophthalmology, Son et al3Son J Shin JY Kim HD et al.Development and validation of deep learning models for screening multiple abnormal findings in retinal fundus images..Ophthalmology. 2019; 126: 85-94Google Scholar (see page 85) report a deep learning classification model for 12 different features from fundus photographs with impressive results. A total of 12 separate, binary classification convolutional neural networks were trained, validated, and tested using a dataset containing 286 050 annotations from 95 350 fundus images. Lesion heatmaps were used to demonstrate the probable areas of salient features. One important strength of this study is its extensive labeling with multiple human experts. A total of 57 ophthalmologists from various subspecialities annotated fundus images, and each image was labeled by 3 independent ophthalmologists. Ground truth was defined by either the majority or the unanimity rule. Another strength was external validation to assess the algorithm’s generalizability. Both in-house datasets along with 2 external databases (the Indian Diabetic Retinopathy image Database and e-ophtha) were analyzed using the presented deep learning algorithm. The areas under receiver operating characteristic curve for the 12 fundus features ranged from 96.2% to 99.9% for the in-house datasets, whereas performance was lower (94.7%–98.0%) in the external datasets, as expected. In addition, 3 retina specialists labeled every image in a third external dataset (Messidor) for all 12 features. Although detection accuracy for most features was comparable with that of the retina specialists, the evaluation of features not related to diabetic retinopathy, in particular, was limited given their low frequencies in this diabetic retinopathy-focused dataset. Although the areas under receiver operating characteristic curve were excellent, the training and validation datasets in this study were not balanced for each classification feature. Thus, reporting the area under precision recall curve would be more informative because the precision recall curve has been shown to be a more accurate measure of performance particularly in unbalanced test sets.4Davis J. Goadrich M. The relationship between precision-recall and ROC curves.Proceedings of the 23rd International Conference on Machine Learning—ICML ’06 2006. 2006; (Accessed July 2, 2019)https://doi.org/10.1145/1143844.1143874Google Scholar Testing the model’s performance in several external validation sets such as was done in this study is a critical component of advancing this field. However, it is important to note that the validation sets did not contain any poor-quality images that would be encountered in real-world deployment, which ultimately may limit the generalizability of the model. Furthermore, although Son et al present the results of 12 independent convolutional neural networks, it would be interesting to compare the performance of the algorithms with a single ensembled model that integrates all 12 networks. Presumably, similar low-level features are used in common across the 12 models, and the current architecture harbors redundancies. Future application of deep learning using current study data could include segmentation or localization models, where each feature would be segmented as in bounding boxes or pixelwise masks. However, these would require a large dataset with even more extensive annotation. Finally, the current model would learn only human-derived features, given that training was based on the expert annotations. Future attempts to train the model with a purely data-driven method may allow novel, machine-discovered biomarkers in various ocular conditions. The study by Son et al demonstrates exciting advances in a deep learning classification model in which multiple features can be identified rather than diagnosis alone, which would allow exploring novel feature-related hypotheses.5Lee C.S. Lee A.Y. Baughman D. et al.The United Kingdom Diabetic Retinopathy Electronic Medical Record Users Group: report 3: baseline retinopathy and clinical features predict progression of diabetic retinopathy.Am J Ophthalmol. 2017; 180: 64-71Abstract Full Text Full Text PDF PubMed Scopus (20) Google Scholar Open sourcing the data and models for others to build on would be an important contribution to this field by allowing the creation of more reproducible methods, encouraging researchers to share data, and creating a collaborative culture for deep learning research in ophthalmology. Development and Validation of Deep Learning Models for Screening Multiple Abnormal Findings in Retinal Fundus ImagesOphthalmologyVol. 127Issue 1PreviewTo develop and evaluate deep learning models that screen multiple abnormal findings in retinal fundus images. Full-Text PDF Open Access

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call