Diabetic retinopathy (DR) is one of the leading causes of blindness in a significant portion of the working population, and its damage on vision is irreversible. Therefore, rapid diagnosis on DR is crucial for saving the patient’s eyesight. Since Transformer shows superior performance in the field of computer vision compared with Convolutional Neural Networks (CNNs), it has been proposed and applied in computer aided diagnosis of DR. However, a large number of images should be used for training due to the lack of inductive bias in Transformers. It has been demonstrated that the retinal vessels follow self-similar fractal scaling law, and the fractal dimension of DR patients shows an evident difference from that of normal people. Based on this, the fractal dimension is introduced as a prior into Transformers to mitigate the adverse influence of lack of inductive bias on model performance. A new Transformer method pretrained with Masked Autoencoders and fractal dimension (MAEFD) is developed and proposed in this paper. The experiments on the APTOS dataset show that the classification performance for DR by the proposed MAEFD can be substantially improved. Additionally, the present model pretrained with 100,000 retinal images outperforms that pretrained with 1 million natural images in terms of DR classification performance.
Read full abstract