TSE DeepLab: An efficient visual transformer for medical image segmentation

Jingdong Yang,Jun Tu,Xiaolin Zhang,Shaoqing Yu,Xianyou Zheng

doi:10.1016/j.bspc.2022.104376

Abstract

Medical image segmentation is the key research of precision medicine. The existing models often ignore some important pixel features and fail to effectively extract global correlation features, which causes poor performance of segmentation. In this paper, we propose TSE DeepLab, which retains the original atrous convolution for extraction of local feature on the basis of DeepLabv3 framework, converts the feature maps after backbone into visual tokens, and further feeds them into Transformer module to enhance the ability of global feature extraction. At the same time, squeeze and excitation components are added to sort the importance of channels after Transformer module, so that the model pays attention to the important pixel features of each channel. In this paper, we apply 5-fold cross-validation to study the clinical sinus instances of Shanghai Tongji Hospital affiliated to Tongji University and the patellar fracture instances of the Sixth People's Hospital affiliated to Shanghai Jiao Tong University. The average of evaluation measures achieves Accuracy of 99.74%, Precision of 93.67%, IOU of 88.10%, Specificity of 99.87%, Fl-score of 93.63%, Sensitivity of 93.82% on sinus dataset and Accuracy of 99.53%, Precision of 85.64%, IOU of 78.47%, Specificity of 99.72%, Fl-score of 87.15%, Sensitivity of 89.95% on patellar fracture dataset. Compared with various typical segmentation models, the proposed model attains better segmentation accuracy and generalization performance, and has better reference value for clinical medical diagnosis.

Full Text