Abstract

We present a new method for multi-source semi-supervised domain adaptation in remote sensing scene classification. The method consists of a pre-trained convolutional neural network (CNN) model, namely EfficientNet-B3, for the extraction of highly discriminative features, followed by a classification module that learns feature prototypes for each class. Then, the classification module computes a cosine distance between feature vectors of target data samples and the feature prototypes. Finally, the proposed method ends with a Softmax activation function that converts the distances into class probabilities. The feature prototypes are also divided by a temperature parameter to normalize and control the classification module. The whole model is trained on both the unlabeled and labeled target samples. It is trained to predict the correct classes utilizing the standard cross-entropy loss computed over the labeled source and target samples. At the same time, the model is trained to learn domain invariant features using another loss function based on entropy computed over the unlabeled target samples. Unlike the standard cross-entropy loss, the new entropy loss function is computed on the model’s predicted probabilities and does not need the true labels. This entropy loss, called minimax loss, needs to be maximized with respect to the classification module to learn features that are domain-invariant (hence removing the data shift), and at the same time, it should be minimized with respect to the CNN feature extractor to learn discriminative features that are clustered around the class prototypes (in other words reducing intra-class variance). To accomplish these maximization and minimization processes at the same time, we use an adversarial training approach, where we alternate between the two processes. The model combines the standard cross-entropy loss and the new minimax entropy loss and optimizes them jointly. The proposed method is tested on four RS scene datasets, namely UC Merced, AID, RESISC45, and PatternNet, using two-source and three-source domain adaptation scenarios. The experimental results demonstrate the strong capability of the proposed method to achieve impressive performance despite using only a few (six in our case) labeled target samples per class. Its performance is already better than several state-of-the-art methods, including RevGrad, ADDA, Siamese-GAN, and MSCN.

Highlights

  • IntroductionIn remote sensing (RS), new images of the earth are acquired at an ever-increasing rate [1]

  • Our proposed supervised domain adaptation (SSDAN) method achieved an overall accuracy of 97.15%, 91.86%, and 91.65% when the targets were UC Merced, RESISC45, and Aerial Image Datasets (AID) datasets respectively, compared to 84.01%, 79.55%, and 91.59% when using the multi-source compensation network (MSCN) method

  • The SSDAN method achieved an overall accuracy (OA) of 93.55%, which is 8% more than the state-of-the-art MSCN method

Read more

Summary

Introduction

In remote sensing (RS), new images of the earth are acquired at an ever-increasing rate [1]. All these new images need to be processed to perform useful tasks. Scene classification is an important processing step in many real-world applications of remote sensing [2,3]. In particular convolutional neural networks (CNN), are the state-of-the-art tools for scene classification [3,4,5]. CNN requires large amounts of labeled data to be trained well. One would think that we have enough labeled data to train a universal CNN model for RS scene classification

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call