Brain tumor image segmentation is one of the most critical tasks in medical imaging for diagnosis, treatment planning, and prognosis. Traditional methods for brain tumor image segmentation are mostly based on Convolution Neural Network (CNN), which have been proved very powerful but still have limitations to effectively capture long-range dependencies and complex spatial hierarchies in MRI images. Variability in the shape, size, and location of tumors may affect the performance and may get stuck into suboptimal outcomes. In these regards, new encoder-decoder architecture with the VisionTranscoder(ViT) is proposed, to enhance brain tumor detection and classification. The proposed VisionTranscoder exploits a transformer's ability in modeling global context through self-attention mechanisms, providing more inclusive interpretation of the intricate patterns in medical images and classification by capturing both local and global features. The proposed VisionTranscoder maintains the Vision Transformer in its encoder for processing images as sequences of patches to capture global dependencies often outside the view of traditional CNNs. Then the segmentation map is rebuilt at a high level of fidelity with the decoder through upsampling and skips connections to maintain detailed spatial information. The risk of overfitting is hugely reduced by design and advanced regularization techniques with extensive data augmentation. The dataset contains 7,023 human brain MRI images, all of which are in four different classes: glioma, meningioma, no tumor, and pituitary. Images from the 'no tumor' class, indicating an MRI scan without any detectable tumor, were taken from the Br35H dataset . The results show the efficiency of VisionTranscoder over a wide set of brain MRI scans, producing an accuracy of 98.5% with a loss of 0.05. This performance underlines the ability of it to accurately segment and classify a brain tumor without overfitting.
Read full abstract