Abstract

In recent years, UNet and its variants have achieved excellent performance. However, since the convolution kernel in UNet focuses solely on local pixels, these models struggle to model long-range dependencies. This issue has been addressed by recently proposed segmentation models based on transformer architecture. The internal self-attention mechanisms included in these systems capture global contextual information to improve segmentation effects. However, the results are often not ideal without pre-training on a large-scale dataset. Therefore, we designed a model (U2-MNet) to overcome the limitations of convolutional kernels, enabling the model to achieve high-accuracy segmentation without pre-training. This approach adopts a multi-layer framework, which can effectively describe multi-level channel information, employing a window-based channel MLP (WCM) block. It utilizes a sliding window and an MLP to capture information about details in local features. In addition, multi-level channel cross mixing (MCCM) block is included in each skip connection to reduce noise after aggregation of low-level and high-level features. The proposed model was trained from scratch and tested on Breast UltraSound Images (BUSI) dataset. Our model achieved 82.76%, 73.17%, and 86.24% on the Dice, IOU, and Precision indicators.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call