Hypertensive Retinopathy (HR) is a retinal manifestation resulting from persistently elevated blood pressure. Severity grading of HR is essential for patient risk stratification, effective management, progression monitoring, timely intervention, and minimizing the risk of vision impairment. Computer-aided diagnosis and artificial intelligence (AI) systems play vital roles in the diagnosis and grading of HR. Over the years, very limited research has been conducted for the grading of HR. Nevertheless, there are no publicly available datasets for HR grading. Moreover, one of the key challenges observed is high-class imbalance. To address these issues, in this paper, we develop "HRSG: Expert-Annotated Hypertensive Retinopathy Severity Grading" dataset, classifying HR severity into four distinct classes: normal, mild, moderate, and severe. Further, to enhance the grading performance on limited datasets, this paper introduces a novel hybrid architecture that combines the strengths of pretrained ResNet-50 via transfer learning, and a modified Vision Transformer (ViT) architecture enhanced with a combination of global self-attention and locality self-attention mechanisms. The locality self-attention addresses the common issue of a lack of inductive bias in ViT architecture. This architecture effectively captures both local and global contextual information, resulting in a robust and resilient classification model. To overcome class imbalance, Decouple Representation and Classifier (DRC) - based training approach is proposed. This method improves the model's ability to learn effective features while preserving the original dataset's distribution, leading to better diagnostic accuracy. Performance evaluation results show the competence of the proposed method in accurately grading the severity of HR. The proposed method achieved an average accuracy of 0.9688, sensitivity of 0.9435, specificity of 0.9766, F1-score of 0.9442, and precision of 0.9474. The comparative results indicate that the proposed method outperforms existing HR methods, state-of-the-art CNN models, and baseline pretrained ViT models. Additionally, we compared our method with a CNNViT model, which combines a shallow CNN architecture with 3 convolution blocks consisting of a convolution layer, a batch normalization layer, a max pooling layer, and lightweight ViT architecture, due to limited datasets. In comparison with the CNNViT, the proposed method achieved superior performance, demonstrating its effectiveness. The experimental results demonstrate the efficacy of the proposed method in accurately grading HR severity.
Read full abstract