Fundus images are used to assist the diagnoses of ocular diseases, and a high-quality fundus image with more details makes clinical diagnostic results more reliable. However, the quality of fundus images is often unsatisfactory due to the turbidity of refractive medium and the doctor-patient cooperation. To enhance the low-quality fundus images, a transformer-based self-supervised network is proposed. During the training phase, an encoder-decoder-based network is introduced. To counteract the drawbacks of establishing long-term dependencies in the convolutional neural network (CNN), the encoder composed of vision transformer and CNN is proposed so that the global and local information of fundus images is fully extracted. On this basis, three reconstruction tasks with self-supervised constraints are designed to enable the network to extract features from different degenerated images. During the testing phase, a low-quality fundus image is decomposed into three feature layers of reverse, illumination, and detail, and then the multi-layer features are fused via the network. To demonstrate the effectiveness of the proposed method, the non-uniform illumination and blurry fundus images are tested. The average scores of NIQE on underexposed, blurred, and overexposed fundus images are 3.03, 2.98, and 2.80, respectively. The average scores of BRISQUE on underexposed, blurred, and overexposed fundus images are 40.32, 40.55, and 39.76, respectively. The average score of subjective evaluation by three ophthalmologists is 61.17%. Compared with the existing methods, the proposed method achieves the superior performance in both subjective and objective evaluations.
Read full abstract