MU-Net: Embedding MixFormer into Unet to Extract Water Bodies from Remote Sensing Images

Yonghong Zhang,Huanyu Lu,Guangyi Ma,Donglin Xie,Kenny Thiam Choy Lim Kam Sian,Sutong Geng,Wei Tian,Huajun Zhao

doi:10.3390/rs15143559

Yonghong Zhang, Huanyu Lu + Show 6 more

Open Access

https://doi.org/10.3390/rs15143559

Copy DOI

Abstract

Water bodies extraction is important in water resource utilization and flood prevention and mitigation. Remote sensing images contain rich information, but due to the complex spatial background features and noise interference, problems such as inaccurate tributary extraction and inaccurate segmentation occur when extracting water bodies. Recently, using a convolutional neural network (CNN) to extract water bodies is gradually becoming popular. However, the local property of CNN limits the extraction of global information, while Transformer, using a self-attention mechanism, has great potential in modeling global information. This paper proposes the MU-Net, a hybrid MixFormer architecture, as a novel method for automatically extracting water bodies. First, the MixFormer block is embedded into Unet. The combination of CNN and MixFormer is used to model the local spatial detail information and global contextual information of the image to improve the ability of the network to capture semantic features of the water body. Then, the features generated by the encoder are refined by the attention mechanism module to suppress the interference of image background noise and non-water body features, which further improves the accuracy of water body extraction. The experiments show that our method has higher segmentation accuracy and robust performance compared with the mainstream CNN- and Transformer-based semantic segmentation networks. The proposed MU-Net achieves 90.25% and 76.52% IoU on the GID and LoveDA datasets, respectively. The experimental results also validate the potential of MixFormer in water extraction studies.

Full Text