MT-ONet: Mixed Transformer O-Net for Medical Image Segmentation

Pengfei Zheng

doi:10.1109/icsmd57530.2022.10058445

Abstract

In the past few years, the deep learning is widely used in the medical industry due to its advantage. Constructed using Convolutional Neural Networks (CNN), the U-Net framework has become the industry standard for solving medical image segmentation tasks. Nonetheless, this framework is incapable of entirely learning all global and remote semantic information. It has been demonstrated that the transformer structure collects more global information than U-Net but less local information than CNN. To improve the performance of segmentation and classification in medical images while maximizing global and local data, we integrate O-Net with Mixed Transformer [1], this fuses the advantages of CNN and Transformer. This enables us to maximize both types of data. We combine CNN, Mixed Transformer, and Local-Global Gaussian-Weighted Self-Attention (LGG-SA) in the encoder component of our proposed O-Net architecture to obtain more global and local background information. The decoder part combines the Mixed Transformer and CNN blocks to obtain the results. The segmentation capability of the proposed network is evaluated by the multi-organ CT dataset containing synaptic information. The results of our trials demonstrate that the proposed MT-ONet can deliver superior segmentation performance relative to cutting-edge methods, resulting in improved classification precision.

Full Text