Abstract

Scene parsing of high spatial resolution (HSR) remote sensing images has achieved notable progress in recent years by the adoption of convolutional neural networks. However, for scene parsing of multimodal remote sensing images, effectively integrating complementary information remains challenging. For instance, the decrease in feature map resolution through a neural network causes loss of spatial information, likely leading to blurred object boundaries and misclassification of small objects. In addition, object scales on a remote sensing image vary substantially, undermining the parsing performance. To solve these problems, we propose an end-to-end common extraction and gate fusion network (CEGFNet) to capture both high-level semantic features and low-level spatial details for scene parsing of remote sensing images. Specifically, we introduce a gate fusion module to extract complementary features from spectral data and digital surface model data. A gate mechanism removes redundant features in the data stream and extracts complementary features that improve multimodal feature fusion. In addition, a global context module and a multilayer aggregation decoder handle scale variations between objects and the loss of spatial details due to downsampling, respectively. The proposed CEGFNet was quantitatively evaluated on benchmark scene parsing datasets containing HSR remote sensing images, and it achieved state-of-the-art performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call