Automated image analysis and classification have increasingly advanced in recent decades owing to machine learning and computer vision. In particular, deep learning (DL) architectures have become popular in resource-limited and labor-restricted environments such as the health-care sector. Transformer architecture, a DL method with self-attention mechanism, excels in natural language processing; however, its application in image-based diagnosis in health-care sector remains limited. Herein, the feasibility, bottlenecks, and performance of transformers in magnetic resonance imaging (MRI)-based brain tumor classification were investigated. To this end, a vision transformer (ViT) model was trained and tested using the popular Brain Tumor Segmentation (BraTS) 2015 dataset for glioma classification. Owing to limited data availability, domain adaptation techniques were used to pretrain the ViT model and the BraTS 2015 dataset was used for its fine-tuning. With the model only trained for 100 epochs, the confusion matrix for the two-class problem of tumor and nontumor classification showed an overall classification accuracy of 81.8%. In conclusion, although convolutional neural networks are traditionally used for DL-based medical image classification owing to their attention mechanism and long-range dependency-capturing capability, ViTs can outperform them in MRI-based brain tumor classification.
Read full abstract