Abstract

Scene parsing of high spatial resolution (HSR) remote sensing images has achieved notable progress in recent years by the adoption of convolutional neural networks. However, for scene parsing of multimodal remote sensing images, effectively integrating complementary information remains challenging. For instance, the decrease in feature map resolution through a neural network causes loss of spatial information, likely leading to blurred object boundaries and misclassification of small objects. In addition, object scales on a remote sensing image vary substantially, undermining the parsing performance. To solve these problems, we propose an end-to-end common extraction and gate fusion network (CEGFNet) to capture both high-level semantic features and low-level spatial details for scene parsing of remote sensing images. Specifically, we introduce a gate fusion module to extract complementary features from spectral data and digital surface model data. A gate mechanism removes redundant features in the data stream and extracts complementary features that improve multimodal feature fusion. In addition, a global context module and a multilayer aggregation decoder handle scale variations between objects and the loss of spatial details due to downsampling, respectively. The proposed CEGFNet was quantitatively evaluated on benchmark scene parsing datasets containing HSR remote sensing images, and it achieved state-of-the-art performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.