Abstract
PurposeSegmentations of retinal layers in spectral-domain optical coherence tomography (SD-OCT) images serve as a crucial tool for identifying and analyzing the progression of various retinal diseases, encompassing a broad spectrum of abnormalities associated with age-related macular degeneration (AMD). The training of deep-learning algorithms necessitates well-defined ground-truth labels, validated by experts, to delineate boundaries accurately. However, this resource-intensive process has constrained the widespread application of such algorithms across diverse OCT devices. This work validates deep learning image segmentation models across multiple OCT devices by testing robustness in generating clinically relevant metrics. DesignProspective, comparative study. ParticipantsAdults over 50 years of age with no AMD to advanced AMD, as defined in the Age-Related Eye Disease Study (AREDS)), in at least one eye, were enrolled. 402 SD-OCT scans were used in this study. MethodsWe evaluate two separate state-of-the-art segmentation algorithms through a training process using images obtained from one OCT device (Heidelberg-Spectralis) and subsequent testing using images acquired from two OCT devices (Heidelberg-Spectralis and Zeiss-Cirrus). This assessment is performed on a dataset that encompasses a range of retinal pathologies, spanning from disease-free conditions to severe forms of AMD, with a focus on evaluating the device independence of the algorithms. Main Outcome MeasuresPerformance metrics (mean-squared-error, mean-absolute-error, dice-coefficients) for the segmentations of the internal limiting membrane (ILM), retinal-pigment-epithelium (RPE), and RPE to Bruch’s-membrane (BM) region, along with en face thickness maps, volumetric estimations (in mm3). Violin plots and Bland-Altman plots comparing predictions against ground-truth are also presented. ResultsThe UNet and DeepLabv3, trained on Spectralis B-scans, demonstrate clinically useful outcomes when applied to Cirrus test B-scans. Review of the Cirrus test-data by two independent annotators revealed that the aggregated-mean-absolute-error in pixels for ILM was 1.82±0.24 (equivalent to 7.0±0.9 μm) and for RPE was 2.46±0.66 (9.5±2.6 μm). Additionally, the dice-similarity-coefficient for the RPE-drusen complex (RPE-DC) region, comparing predictions to ground truth, reached 0.87±0.01. ConclusionIn the pursuit of task-specific goals such as retinal layer segmentation, a segmentation network has the capacity to acquire domain-independent features from a large training dataset. This enables the utilization of the network to execute tasks in domains where ground truth is hard to generate.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have