To create and validate code-free automated deep learning models (autoML) for diabetic retinopathy (DR) classification from handheld retinal images. Prospective development and validation of autoML models for DR image classification. 17,829 de-identified retinal images from 3,566 eyes with diabetes acquired using handheld retinal cameras in a community-based DR screening program. AutoML models were generated based on previously acquired 5-field (macula-centered, disc-centered, superior, inferior, temporal macula) handheld retinal images. Each individual image was labeled using the International DR and diabetic macular edema (DME) classification scale by four certified graders at a centralized reading center under oversight by a senior retina specialist. Images for model development were split 8-1-1 for training, optimization, and testing to detect referable DR [(refDR), defined as moderate nonproliferative DR or worse or any level of DME]. Internal validation was performed using a published image set from the same patient population (N=450 images from 225 eyes). External validation was performed using a publicly available retinal imaging dataset from the Asia Pacific Tele-Ophthalmology Society (N=3,662 images). Area under the precision-recall curve (AUPRC), sensitivity, specificity, positive predictive value, negative predictive value, (SN, SP, PPV, NPV, respectively) accuracy, and F1 scores. RefDR was present in 17.3%, 39.1% and 48.0% of the training set, internal and external validation sets respectively. The model's AUPRC was 0.995 with a precision and recall of 97% using a score threshold of 0.5. Internal validation showed SN, SP, PPV, NPV, accuracy and F1 scores were 0.96 (95% CI:0.884-0.99), 0.98 (95% CI:0.937-0.995), 0.96 (95% CI:0.884-0.99), 0.98 (95% CI:0.937-0.995), 0.97 and 0.96, respectively. External validation showed SN, SP, PPV, NPV, accuracy and F1 scores were 0.94 (95% CI:0.929-0.951), 0.97 (95% CI:0.957-0.974), 0.96 (95% CI:0.952-0.971), 0.95 (95% CI:0.935-0.956), 0.97 and 0.96, respectively. This study demonstrates the accuracy and feasibility of code-free autoML models for identifying refDR developed using handheld retinal imaging in a community-based screening program. Potentially, the use of autoML may increase access to machine learning models that may be adapted for specific programs that are guided by the clinical need to rapidly address disparities in healthcare delivery.
Read full abstract