Abstract

To solve the problems of existing hybrid networks based on convolutional neural networks (CNN) and Transformers, we propose a new encoder–decoder network FI‐Net based on CNN‐Transformer for medical image segmentation. In the encoder part, a dual‐stream encoder is used to capture local details and long‐range dependencies. Moreover, the attentional feature fusion module is used to perform interactive feature fusion of dual‐branch features, maximizing the retention of local details and global semantic information in medical images. At the same time, the multi‐scale feature aggregation module is used to aggregate local information and capture multi‐scale context to mine more semantic details. The multi‐level feature bridging module is used in skip connections to bridge multi‐level features and mask information to assist multi‐scale feature interaction. Experimental results on seven public medical image datasets fully demonstrate the effectiveness and advancement of our method. In future work, we plan to extend FI‐Net to support 3D medical image segmentation tasks and combine self‐supervised learning and knowledge distillation to alleviate the overfitting problem of limited data training.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.