Purpose:Glaucoma is one of the most common causes of permanent blindness in the world; early detection and precise diagnosis are essential to successful treatment.Convolutional Neural Networks (CNNs) are one of the deep learning techniques that have shown excellent results in the processing of medical images. Methodology:Using a dataset of 1,650 fundus pictures from the REFUGE, ORIGA, and ACRIMA databases600 glaucoma-positive and 1,050 glaucoma-negative samplesthis study assesses the effectiveness of three cutting-edge deep learning models for glaucoma classification. Pretrained models like ResNet-50, VGG16, GoogLeNet, Vision Transformer (ViT), and Swin Transformer are investigated, emphasizing on their capacity to extract key variables such as the optic cup-to-disc ratio, retinal nerve fiber layer thickness, and vascular patterns.Result:Swin Transformer outperformed other models, achieving 100% accuracy, precision, recall, and F1-score in perfect classification. ViT and GoogLeNet similarly showed remarkable performance, achieving 92.92% and 91.54% accuracy. ResNet-50 and VGG16, on the other hand, had lower accuracy percentages of 74.62% and 78.77%. A number of drawbacks were found in all the models. ResNet-50 suffered from underfitting, which resulted in incorrect classifications and lower validation accuracy. While VGG16 was effective in standard image classification tasks, it showed inadequate recall and substantial validation loss, especially in situations when the patient had glaucoma. GoogLeNet struggled with overfitting, which limited its ability to generalise to new data, whereas ViT needed a lot of processing power and initially had trouble correctly categorising cases of glaucoma.On the other hand, Swin Transformer used hierarchical feature maps and changing windows to efficiently capture both local and global picture information, such as blood vessel patterns and the structure of the optic nerve head. A confusion matrix verified that there were no misclassifications in the model's flawless generalisation. Conclusion:In summary, the study found that Swin Transformer was the most dependable and resilient model for glaucoma diagnosis, even though models like as ViT and GoogLeNet shown potential. This study highlights the promise of transformer-based topologies, in particular Swin Transformer, as a state-of-the-art remedy for ophthalmology-related medical image classification challenges.
Read full abstract