Abstract

ABSTRACTImage interpretation algorithms based on deep learning are becoming increasingly important in land cover information acquisition. We propose CSSwin-unet, a network designed for the semantic segmentation of remote sensing images. CSSwin-unet is based on Swin-unet, follows the U-shaped codec structure of U-Net, but utilizes Swin transformer blocks with superior global modelling capabilities to constitute the codec. In addition, we design a parallel branch in the encoder with a context aggregation module (CAM) to enhance contextual information extraction and alleviate the semantic ambiguity problem resulting from occlusion. To address the problem of semantic information mismatch between codecs and improve the model’s ability to extract spatial information, we constructed a space extraction module (SEM) in the skip connections, which replaces the direct copying of encoder features in Swin-unet. To reduce information loss during the downsampling process and strengthen the segmentation capacity of the network, we designed a feature shrinkage module (FSM) in the downsampling session. We conducted comprehensive ablation experiments on a dataset we produced ourselves and compared the results with other advanced methods. The test results showed significant improvement, with mIoU, mF1, and OA values improving by 2.83%, 2.47%, and 2.05%, respectively, compared to the second best performing model, Swin-unet. The above results prove the excellent performance of CSSwin-unet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call