Abstract

Although deep learning-based approaches have made significant progress in remote sensing image classification, the supervised learning paradigm has shortcomings under a limited number of labeled samples, which restricts the classification performance to a great extent. In this article, we investigate an effective self-supervised feature representation architecture (SSFR) for multimodal remote sensing images few-shot land cover classification. Specifically, we exploit multiview learning strategy to construct multiple views from multimodal remote sensing images. This method builds several complementary views of the same observed scenes from hyperspectral images or different modalities of remote sensing data. Then we build the deep feature extractor to learn high-level feature representations from each view via contrastive learning. The contrastive learning aggregates the samples of the same scene while separating samples of different scenes in the latent space, and this process does not require any labeled information. What’s more, to learn more robust features from different views, we utilize multitask learning strategy to train the feature extraction network. Finally, a lightweight machine learning method is employed to classify the learned features using a few annotated samples. To further demonstrate the self-supervised feature learning capability of the proposed model, we train the feature representation network in multiple source datasets. Comprehensive feature learning and classification experiments have certified the effectiveness and superiority of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call