PCT: Pyramid convolutional transformer for parotid gland tumor segmentation in ultrasound images

Gang Zhang,Chenhong Zheng,Jianfeng He,Sanli Yi

doi:10.1016/j.bspc.2022.104498

Abstract

Preoperative segmentation of parotid gland tumor regions using deep learning is of great significance for treatment decisions. However, there are still two major limitations: to the best of our knowledge, no networks are designed specifically for parotid gland tumor segmentation tasks; and neither convolutional neural network (CNN) nor Transformer can extract both global and local feature solely. To address these issues, we first propose a Pyramid Convolutional Transformer (PCT) architecture based on the shrinking pyramid framework and Fusion Attention Transformer CNN (FTC) block for parotid gland tumors segmentation. In this architecture, the shrinking pyramid framework can effectively capture parotid gland tumor image features with dense pixel by integrating multi-scale dependencies of images. And the FTC block is constructed to address complex and variable contour characteristics of parotid gland tumors, which combines Transformer with CNN for preferable extracting global and local features of images by dual branch structure. The experimental results suggest that proposed PCT achieved intersection-over-union (IoU) of 0.8434 and Dice similarity coefficient (Dice) of 0.9151 on parotid gland tumor segmentation (PGTSeg) dataset, and attained new state-of-the-art performance on multiple challenging benchmarks with IoU of 0.8521 on MoNuSeg and IoU of 0.9080 on ISIC 2018. Meanwhile, common backbones equipped with FTC block outperformed the baseline model. The code and models will be available at: https://github.com/Twoverz/PCT-Pyramid-Convolutional-Transformer.

Full Text