Abstract

Experts require large high-resolution retinal images to detect tiny abnormalities, such as microaneurysms or issues of vascular branches. However, these images often suffer from low quality (e.g., resolution) due to poor imaging device configuration and misoperations. Many works utilized Convolutional Neural Network-based (CNN) methods for image super-resolution. The authors focused on making these models more complex by adding layers and various blocks. It leads to additional computational expenses and obstructs the application in real-life scenarios. Thus, this paper proposes a novel, lightweight, deep-learning super-resolution method for retinal images. It comprises a Vision Transformer (ViT) encoder and a convolutional neural network decoder. To our best knowledge, this is the first attempt to use a transformer-based network to solve the issue of accurate retinal image super-resolution. A progressively growing super-resolution training technique is applied to increase the resolution of images by factors of 2, 4, and 8. The prominent architecture remains constant thanks to the adaptive patch embedding layer, which does not lead to additional computational expense due to increased up-scaling factors. This patch embedding layer includes 2-dimensional convolution with specific values of kernel size and strides that depend on the input shape. This strategy has removed the need to append additional super-resolution blocks to the model. The proposed method has been evaluated through quantitative and qualitative measures. The qualitative analysis also includes vessel segmentation of super-resolved and ground truth images. Experimental results indicate that the proposed method outperforms the current state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call