Abstract

Transformer architecture has emerged to be successful in many natural language processing tasks. However, its applications to clinical practice remain largely unexplored. In this study, we propose a Robust Cross-Scale Hybrid Transformer (RCSHT) architecture for medical image segmentation, which can effectively enhance the multi-scale feature representations while integrating local features with global dependencies. Specifically, we propose two new modules based on the self-attention mechanism: PSCM and PCCM, which perform matrix low-rank transformation on cross-scale images in spatial and channel space, respectively, to effectively enhance the discriminant ability of long-range dependencies of multi-scale images and effectively reduce the computational complexity. Meanwhile, we use spatial positional encoding to perform spatial regularization on the two attention modules, so that the model has more opportunities to obtain valuable spatial details and further enhance the spatial retention ability. Besides, by fusing outputs of the two modules, the robustness and discriminant ability of the feature representations are further improved. Note that our hybrid architecture allows transformers to be initialized as convolutional networks without pre-training. Extensive experiments conducted on our newly proposed dataset have demonstrated our RCSHT notably outperforms the state-of-the-art methods by a large margin, holding the promise to generalize well on other medical image segmentation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call