This study compared two imaging grading techiques to assess the utility of longitudinal image-based analysis in retinopathy of prematurity (ROP) screening: 1) time-limited without image comparison (a proxy for bedside indirect ophthalmoscopy, termed sBIO) and time-unlimited with image comparison (for telemedicine grading, termed TELE) screening. We tested two hypotheses: 1) H1: TELE was superior to sBIO for the detection of change (Tempo)-same, better, or worse-and, 2) H2: granular data of change (e.g. at the image and feature level) is integrated by graders to achieve the Tempo assessment. Prospective reliability analysis. Gold standard reference (GS) was a published curated ROP image database consisting of both Tempo and granular level changes (image and components) from 40 patients in 2 sets. Graders were divided into 2 cohorts. There were two screening techniques-1) sBIO with time limited review of 10 minutes/patient, access to prior notes and drawings and 2) TELE with unlimited review time, access to prior weeks' images, notes and schematics. Graders switched techniques and sets after 6 weeks. H1 outcome was comparison of graders' weekly Tempo scores to GS-Gestalt and for H2 was Tempo score compared to GS-View and GS-Component. H1 demonstrated no difference-accuracy of sBIO and TELE compared to GS was 51.7% and 51.9% respectively (p=0.95). Highest agreement occurred when all exams exhibited no change (91.5% sBIO vs. 93.5% TELE, p=0.46) and worst agreement was when exams always demonstrated worsening (46.5% sBIO vs. 47.1% TELE, p=0.93). Both sets of graders did worse in weeks 7-12, irrespective of technique. H2 demonstrated that Tempo assessment did not correlate with granular data changes in the GS for View level and Component level assessments-overall agreement dropped to 31.4% for Tempo vs GS-VIEW (31.2% for sBIO, 31.5% for TELE) and 4.6% for Tempo vs GS-COMPONENT (4.9% for sBIO, 4.3% for TELE). Detection of ROP Tempo was independent of screening technique by expert pediatric retina graders. Both groups did significantly better in the first half of the study, indicative of a fatigue factor. This is the first study in ROP history to demonstrate that graders integrate image and retinal features in various ways that can be in contradiction of their assessment of overall disease progression.
Read full abstract