Multiscale Cascaded Network for the Semantic Segmentation of High-Resolution Remote Sensing Images

Xiaolu Zhang,Zhaoshun Wang,Anlei Wei

doi:10.1080/07038992.2023.2255068

Abstract

As remote sensing images have complex backgrounds and varying object sizes, their semantic segmentation is challenging. This study proposes a multiscale cascaded network (MSCNet) for semantic segmentation. The resolutions employed with respect to the input remote sensing images are 1, 1/2, and 1/4, which represent high, medium, and low resolutions. First, 3 backbone networks extract features with different resolutions. Then, using a multiscale attention network, the fused features are input into the dense atrous spatial pyramid pooling network to obtain multiscale information. The proposed MSCNet introduces multiscale feature extraction and attention mechanism modules suitable for remote sensing land-cover classification. Experiments are performed using the Deepglobe, Vaihingen, and Potsdam datasets; the results are compared with those of the existing classical semantic segmentation networks. The findings indicate that the mean intersection over union (mIoU) of the MSCNet is 4.73% higher than that of DeepLabv3+ with the Deepglobe datasets. For the Vaihingen datasets, the mIoU of the MSCNet is 15.3%, and 6.4% higher than those of a segmented network (SegNet), and DeepLabv3+, respectively. For the Potsdam datasets, the mIoU of the MSCNet is higher than those of a fully convolutional network, Res-U-Net, SegNet, and DeepLabv3+ by 11.18%, 5.89%, 4.78%, and 3.03%, respectively.

Full Text