In the field of medical image segmentation, although U-Net has achieved significant achievements, it still exposes some inherent disadvantages when dealing with complex anatomical structures and small targets, such as inaccurate target localization, blurry edges, and insufficient integration of contextual information. To address these challenges, this study proposes the Attention-Fused Full-Scale CNN-Transformer Unet (AFC-Unet), aiming to effectively overcome the limitations of traditional U-Net through multi-scale feature fusion, attention mechanisms, and CNN-Transformer hybrid modules. Specifically, we adopt an encoder–decoder architecture, incorporating full-scale feature block fusion and pyramid sampling modules to enhance the model’s ability to recognize fine to overall structural features by integrating cross-level multi-scale features. We propose the Multi-feature Fusion Attention Gates (MFAG) module, which focuses on and highlights discriminative information of potential lesions and key anatomical boundaries, effectively suppressing irrelevant background interference. We design a module Convolutional Hybrid Attention Transformer (CHAT) that integrates CNN and Transformer to address the shortcomings of traditional single models in handling long-range dependencies and global context understanding. Experimental results on three datasets of different scales demonstrate that the model’s segmentation performance for medical images surpasses state-of-the-art models, showcasing high generalization ability.
Read full abstract