Simultaneous segmentation of multiple structures in fundal images using multi-tasking deep neural networks

Sunil Kumar Vengalil,Neelam Sinha,Bharath Krishnamurthy

doi:10.3389/frsip.2022.936875

Sunil Kumar Vengalil, Neelam Sinha + Show 1 more

Open Access

https://doi.org/10.3389/frsip.2022.936875

Copy DOI

Abstract

Introduction: Fundal imaging is the most commonly used non-invasive technique for early detection of many retinal diseases such as diabetic retinopathy (DR). An initial step in automatic processing of fundal images for detecting diseases is to identify and segment the normal landmarks: the optic disc, blood vessels, and macula. In addition to these structures, other parameters such as exudates that help in pathological evaluations are also visible in fundal images. Segmenting features like blood vessels pose multiple challenges because of their fine-grained structure that must be captured at original resolution and the fact that they are spread across the entire retina with varying patterns and densities. Exudates appear as white patches of irregular shapes that occur at multiple locations, and they can be confused with the optic disc, if features like brightness or color are used for segmentation.Methods: Segmentation algorithms solely based on image processing involve multiple parameters and thresholds that need to be tuned. Another approach is to use machine learning models with inputs of hand-crafted features to segment the image. The challenge in this approach is to identify the correct features and then devise algorithms to extract these features. End-to-end deep neural networks take raw images with minimal preprocessing, such as resizing and normalization, as inputs, learn a set of images in the intermediate layers, and then perform the segmentation in the last layer. These networks tend to have longer training and prediction times because of the complex architecture which can involve millions of parameters. This also necessitates huge numbers of training images (2000‒10,000). For structures like blood vessels and exudates that are spread across the entire image, one approach used to increase the training data is to generate multiple patches from a single training image, thus increasing the total number of training samples. Patch-based time cannot be applied to structures like the optic disc and fovea that appear only once per image. Also the prediction time is larger because segmenting a full image involves segmenting multiple patches in the image.Results and Discussion: Most of the existing research has been focused on segmenting these structures independently to achieve high performance metrics. In this work, we propose a multi-tasking, deep learning architecture for segmenting the optic disc, blood vessels, macula, and exudates simultaneously. Both training and prediction are performed using the whole image. The objective was to improve the prediction results on blood vessels and exudates, which are relatively more challenging, while utilizing segmentation of the optic disc and the macula as auxiliary tasks. Our experimental results on images from publicly available datasets show that simultaneous segmentation of all these structures results in a significant improvement in performance. The proposed approach makes predictions of all four structures in the whole image in a single forward pass. We used modified U-Net architecture with only convolutional and de-convolutional layers and comparatively.

Full Text