Abstract

This paper addresses the problem of semi-supervised transfer learning with limited cross-modality data in remote sensing. A large amount of multi-modal earth observation images, such as multispectral imagery (MSI) or synthetic aperture radar (SAR) data, are openly available on a global scale, enabling parsing global urban scenes through remote sensing imagery. However, their ability in identifying materials (pixel-wise classification) remains limited, due to the noisy collection environment and poor discriminative information as well as limited number of well-annotated training images. To this end, we propose a novel cross-modal deep-learning framework, called X-ModalNet, with three well-designed modules: self-adversarial module, interactive learning module, and label propagation module, by learning to transfer more discriminative information from a small-scale hyperspectral image (HSI) into the classification task using a large-scale MSI or SAR data. Significantly, X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network, yielding semi-supervised cross-modality learning. We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.

Highlights

  • Operational radar (e.g., Sentinel-1) and optical broadband satellites (e.g., Sentinel-2 and Landsat-8) enable the synthetic aperture radar (SAR) (Kang et al, 2020) and multispectral image (MSI) (Zhang et al, 2019a) openly available on a global scale

  • Results on the heterogeneous datasets Similar to the former dataset, we evaluate the performance for the Heterogeneous hyperspectral image (HSI)-SAR scene quantitatively

  • This might explain the case that the multimodal deep learning (MDL)-CW observably exceeds most compared methods without such interactive module, e.g., Bimodal deep autoencoder (DAE) and its semi-supervised version (Bimodal stacked denoising autoencoder (SDAE)) as well as CorrNet

Read more

Summary

Introduction

Operational radar (e.g., Sentinel-1) and optical broadband (multispectral) satellites (e.g., Sentinel-2 and Landsat-8) enable the synthetic aperture radar (SAR) (Kang et al, 2020) and multispectral image (MSI) (Zhang et al, 2019a) openly available on a global scale. RS data classification is a fundamental but still challenging problem across computer vision and RS fields It aims to assign a semantic category to each pixel in a studied urban scene. Enormous efforts have been made on developing deep learning (DL)-based approaches (LeCun et al, 2015), such as deep neural networks (DNNs) and convolutional neural networks (CNNs), to parse urban scenes by using street view images. It is less investigated at the level of satellite-borne or aerial images. Bridging advanced learning-based techniques or vision algorithms with RS imagery

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.