Computed tomography (CT) scans play a key role in the diagnosis of stroke, a leading cause of morbidity and mortality worldwide. However, interpreting these scans is often challenging, necessitating automated solutions for timely and accurate diagnosis. This research proposed a novel hybrid model that integrates a Vision Transformer (ViT) and a Long Short Term Memory (LSTM) to accurately detect and classify stroke characteristics using CT images. The ViT identifies essential features from CT images, while LSTM processes sequential information generated by the ViT, adept at capturing crucial temporal dependencies for understanding patterns and context in sequential data. Moreover, our approach addresses class imbalance issues in stroke datasets by utilizing advanced strategies to improve model robustness. To ensure clinical relevance, Explainable Artificial Intelligence (XAI) methods, including attention maps, SHAP, and LIME, were incorporated to provide reliable and interpretable predictions. The proposed model was evaluated using the primary BrSCTHD-2023 dataset, collected from Rajshahi Medical College Hospital, achieving top accuracies of 73.80%, 91.61%, 93.50%, and 94.55% with the SGD, RMSProp, Adam, and AdamW optimizers, respectively. To further validate and generalize the model, it was also tested on the Kaggle brain stroke dataset, where it achieved an impressive accuracy of 96.61%. The proposed ViT-LSTM model significantly outperformed traditional CNNs and ViT models, demonstrating superior diagnostic performance and generalizability. This study advances automated stroke diagnosis by combining deep learning innovations, domain expertise, and enhanced interpretability to support clinical decision-making, providing reliable diagnostic solutions.
Read full abstract