Abstract

Attention-based methods have recently demonstrated notable advancements in brain tumor classification. To further advance and strengthen this development, we have developed ConvAttenMixer, a transformer model that incorporates convolutional layers along with two attention mechanisms: self-attention and external attention. The proposed model utilizes two blocks of convolution mixers to effectively process and blend across patches, thereby enhancing the model's ability to capture spatial and channel-wise dependencies in MRI brain images. The self-attention block enables the model to prioritize important regions within the image and establish dependencies by assigning weights to each part based on their relevance to the task. This allows the model to emphasize crucial local features, disregard irrelevant ones, and capture interactions between different patches. On the other hand, the external attention block focuses more on significant global features and captures interactions among different images, enabling the model to establish dependencies and correlations across all samples. The classification head in the proposed model is a simple yet effective block designed to process the output feature maps using a squeeze-and-excitation mechanism, which in turn assigns higher weights to important channels and suppresses less-relevant channels. For experimentation, our ConvAttenMixer model was trained on a dataset consisting of 5712 MRI scans and subsequently tested on 1311 scans for classification into glioma, meningioma, pituitary tumor, and no-tumor images. Different variants of the proposed model were tested and evaluated. The optimally performing architecture was evaluated against the state-of-the-art baselines, namely self-attention MLP, external attention MLP, attention-based pooling convolutional net, and convolutional mixer net. Extensive experiments demonstrated that ConvAttenMixer outperformed the other baselines, which employed either self-attention or external attention mechanisms, while requiring significantly less computational memory. The suggested model exhibited higher precision, recall, and f-measure, achieving the highest accuracy of 0.9794 compared with the baselines' accuracy, which ranged from 0.87 to 0.93. The ConvAttenMixer model demonstrates the ability to operate locally on the patch level using self-attention and globally on the sample level using external attention, as well as prioritize important information on the spatial level and channel level using convolution mixers and the squeeze-and-excitation mechanism.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.