To evaluate the clinical usefulness of a quantitative deep learning-derived vascular severity score for retinopathy of prematurity (ROP) by assessing its correlation with clinical ROP diagnosis and by measuring clinician agreement in applying a novel scale. Analysis of existing database of posterior pole fundus images and corresponding ophthalmoscopic examinations using 2 methods of assigning a quantitative scale to vascular severity. Images were from clinical examinations of patients in the Imaging and Informatics in ROP Consortium. Four ophthalmologists and 1 study coordinator evaluated vascular severity on a scale from 1 to9. A quantitative vascular severity score (1-9) was applied to each image using a deep learning algorithm. A database of 499 images was developed for assessment of interobserver agreement. Distribution of deep learning-derived vascular severity scores with the clinical assessment of zone (I, II, or III), stage (0, 1, 2, or 3), and extent (<3 clock hours, 3-6 clock hours, and >6 clock hours) of stage 3 evaluated using multivariate linear regression and weighted κ values and Pearson correlation coefficients for interobserver agreement on a 1-to-9 vascular severity scale. For deep learning analysis, a total of 6344 clinical examinations were analyzed. A higher deep learning-derived vascular severity score was associated with more posterior disease, higher disease stage, and higher extent of stage 3 disease (P < 0.001 for all). For a given ROP stage, the vascular severity score was higher in zone I than zones II or III (P < 0.001). Multivariate regression found zone, stage, and extent all were associated independently with the severity score (P < 0.001 for all). For interobserver agreement, the mean ± standard deviation weighted κ value was 0.67 ± 0.06, and the Pearson correlation coefficient ± standard deviation was 0.88 ± 0.04 on the use of a 1-to-9 vascular severity scale. A vascular severity scale for ROP seems feasible for clinical adoption; corresponds with zone, stage, extent of stage 3, and plus disease; and facilitates the use of objective technology such as deep learning to improve the consistency of ROP diagnosis.
Read full abstract