Abstract

Medical image segmentation result is an essential reference for disease diagnosis. Recently, with the development and application of convolutional neural networks, medical image processing has significantly developed. However, most existing automatic segmentation tasks are still challenging due to various positions, sizes, and shapes, resulting in poor segmentation performance. In addition, most of the current methods use the encoder–decoder architecture for feature extraction, focusing on the acquisition of semantic information but ignoring the specific target and global context information. In this work, we propose a hybrid-scale contextual fusion network to capture the richer spatial and semantic information. First, a hybrid-scale embedding layer (HEL) is employed before the transformer. By mixing each embedding with multiple patches, the object information of different scales can be captured availably. Further, we present a standard transformer to model long-range dependencies in the first two skip connections. Meanwhile, the pooling transformer (PTrans) is employed to handle long input sequences in the following two skip connections. By leveraging the global average pooling operation and the corresponding transformer block, the spatial structure information of the target will be learned effectively. In the last, dual-branch channel attention module (DCA) is proposed to focus on crucial channel features and conduct multi-level features fusion simultaneously. By utilizing the fusion scheme, richer context and fine-grained features are captured and encoded efficiently. Extensive experiments on three public datasets demonstrate that the proposed method outperforms state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call