Abstract

Recently, deep learning has been widely used in the segmentation tasks of remote sensing images. However, the existing deep learning method most focus on local contextual information and has limited field of perception, which makes it difficult to capture the long-range contextual feature of objects at large scales form very-high-resolution (VHR) images. In this paper, we present a novel Local–global Framework consisting of the dual-source fusion network and local–global transformer modules, which efficiently utilize features extracted from multiple sources and fully capture features of local and global regions. The dual-source fusion network is an encoder designed to extract features from multiple sources such as spectra, synthetic aperture radar, and elevations, which selective fuse features from multiple sources and reduce the interference of redundant features. The local–global transformer module is proposed to capture fine-grained local features and coarse-grained global features, which enables the framework to focus on recognizing multiple-scale objects from the local and global regions. Moreover, we propose a pixelwise contrastive loss, which could encourage that the prediction is pulled closer to the ground truth. The Local–global Framework achieves state-of-the-art performance with 90.45% mean f1 score on the ISPRS Vaihingen dataset and 93.20% mean f1 score on the ISPRS Potsdam dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call