Abstract
Breast mass segmentation plays a crucial role in early breast cancer detection and diagnosis, and while Convolutional Neural Networks (CNN) have been widely used for this task, their reliance on local receptive fields limits ability to capture long-range dependencies. Vision Transformers (ViTs), on the other hand, excel in this area by leveraging multi-head self-attention mechanisms to generate attention maps that dynamically gather global spatial information, significantly outperforming CNN-based architectures in various tasks. However, traditional transformer-based models come with challenges, including high computational complexity due to the self-attention mechanism and inefficiency in the static MLP fusion process. To overcome these issues, the Hybrid Transformer U-Net (HTU-net) model is proposed for breast mass segmentation in mammography. Channel and spatial enhanced self-attention mechanisms are integrated with convolutions layers in HTU-Net, creating a hybrid architecture that combines the strengths of both CNNs and ViTs. The introduction of a multiscale attention mechanism further improves the model's ability to fuse information from different resolutions, enhancing the decoder's capacity to reconstruct fine details in the segmented output. By using both local texture-based features and global contextual information, HTU-Net excels in capturing essential features, thus improving segmentation performance. The experimental results across multiple datasets, including CBIS-DDSM and INbreast, demonstrate that HTU-Net outperforms several state-of-the-art methods, achieving superior accuracy, dice similarity coefficient, and intersection over union. This work highlights the potential of hybrid architectures in advancing computer-aided diagnosis systems, particularly in improving segmentation quality and reliability for breast cancer detection.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have