Retinopathy of prematurity (ROP) telemedicine screening programs have been found to be effective, but they rely on widefield digital fundus imaging (WDFI) cameras, which are expensive, making them less accessible in low- to middle-income countries. Cheaper, smartphone-based fundus imaging (SBFI) systems have been described, but these have a narrower field of view (FOV) and have not been tested in a real-world, operational telemedicine setting. To assess the efficacy of SBFI systems compared with WDFI when used by technicians for ROP screening with both artificial intelligence (AI) and human graders. This prospective cross-sectional comparison study took place as a single-center ROP teleophthalmology program in India from January 2021 to April 2022. Premature infants who met normal ROP screening criteria and enrolled in the teleophthalmology screening program were included. Those who had already been treated for ROP were excluded. All participants had WDFI images and from 1 of 2 SBFI devices, the Make-In-India (MII) Retcam or Keeler Monocular Indirect Ophthalmoscope (MIO) devices. Two masked readers evaluated zone, stage, plus, and vascular severity scores (VSS, from 1-9) in all images. Smartphone images were then stratified by patient into training (70%), validation (10%), and test (20%) data sets and used to train a ResNet18 deep learning architecture for binary classification of normal vs preplus or plus disease, which was then used for patient-level predictions of referral warranted (RW)- and treatment requiring (TR)-ROP. Sensitivity and specificity of detection of RW-ROP, and TR-ROP by both human graders and an AI system and area under the receiver operating characteristic curve (AUC) of grader-assigned VSS. Sensitivity and specificity were compared between the 2 SBFI systems using Pearson χ2testing. A total of 156 infants (312 eyes; mean [SD] gestational age, 33.0 [3.0] weeks; 75 [48%] female) were included with paired examinations. Sensitivity and specificity were not found to be statistically different between the 2 SBFI systems. Human graders were effective with SBFI at detecting TR-ROP with a sensitivity of 100% and specificity of 83.49%. The AUCs with grader-assigned VSS only were 0.95 (95% CI, 0.91-0.99) and 0.96 (95% CI, 0.93-0.99) for RW-ROP and TR-ROP, respectively. For the AI system, the sensitivity of detecting TR-ROP sensitivity was 100% with specificity of 58.6%, and RW-ROP sensitivity was 80.0% with specificity of 59.3%. In this cross-sectional study, 2 different SBFI systems used by technicians in an ROP screening program were highly sensitive for TR-ROP. SBFI systems with AI may be a cost-effective method to improve the global capacity for ROP screening.
Read full abstract