Integrating transformers and convolutional neural networks represents a crucial and cutting-edge approach for tackling medical image segmentation problems. Nonetheless, the existing hybrid methods fail to fully leverage the strengths of both operators. During the Patch Embedding, the patch projection method ignores the two-dimensional structure and local spatial information within each patch, while the fixed patch size cannot capture features with rich representation effectively. Moreover, the calculation of self-attention results in attention diffusion, hindering the provision of precise details to the decoder while maintaining feature consistency. Lastly, none of the existing methods establish an efficient multi-scale modeling concept. To address these issues, we design the Collaborative Networks of Transformers and Convolutional neural networks (TC-CoNet), which is generally used for accurate 3D medical image segmentation. First, we elaborately design precise patch embedding to generate 3D features with accurate spatial position information, laying a solid foundation for subsequent learning. The encoder–decoder backbone network is then constructed by TC-CoNet in an interlaced combination to properly incorporate long-range dependencies and hierarchical object concepts at various scales. Furthermore, we employ the constricted attention bridge to constrict attention to local features, allowing us to accurately guide the recovery of detailed information while maintaining feature consistency. Finally, atrous spatial pyramid pooling is applied to high-level feature map to establish the concept of multi-scale objects. On five challenging datasets, including Synapse, ACDC, brain tumor segmentation, cardiac left atrium segmentation, and lung tumor segmentation, the extensive experiments demonstrate that TC-CoNet outperforms state-of-the-art approaches in terms of superiority, migration, and strong generalization. These illustrate in full the efficacy of the proposed transformers and convolutional neural networks combination for medical image segmentation. Our code is freely available at: https://github.com/YongChen-Exact/TC-CoNet.
Read full abstract