Abstract

Recently, methods of scene classification that are based on deep learning have become increasingly mature in remote sensing. However, training an excellent deep learning model for remote sensing scene classification requires a large number of labeled samples. Therefore, scene classification with insufficient scene images becomes a challenge. The deepEMD network is currently the most popular model for solving these tasks. Although the deepEMD network obtains impressive results on common few-shot baseline datasets, it is insufficient for capturing discriminative feature information about the scene from global and local perspectives. For this reason, an efficient few-shot scene classification scheme in remote sensing is proposed by combining multiple attention mechanisms and the attention-reference mechanism into the deepEMD network in this paper. First, scene features can be extracted by the backbone that incorporates global attention module and local attention module, which enables the backbone to capture discriminative information from both the global level and the local level. Second, the attention-reference mechanism generates the weights of elements in the earth mover’s distance (EMD) formulation, which can effectively alleviate the effects of complex background and intra-class morphological differences. The experimental results on three popular remote sensing benchmark datasets, Aerial Image Dataset (AID), OPTIMAL-31, and UC Merced, illustrate that our proposed scheme obtains state-of-the-art results in few-shot remote sensing scene classification.

Highlights

  • Scene classification issue is an increasingly popular research topic in the field of remote sensing image recognition

  • Compared with common scene classification, few-shot scene classification in remote sensing is conducted under the very lack of available labeled samples, which aims to improve the dilemma faced by data-driven deep learning models, such as one-shot learning and five-shot learning [5], [6]

  • Most of the existing models that are based on traditional handcrafted features, such as vector of locally aggregated descriptors (VLAD), locality-constrained linear coding (LLC) and spatial pyramid matching (SPM) [8]–[10], extract the features that are passed to the classifier to refine the features and enable the features to have translation invariance, scale invariance and sparsity to obtain a more robust model

Read more

Summary

INTRODUCTION

Scene classification issue is an increasingly popular research topic in the field of remote sensing image recognition. If the new remote sensing scene task only has few labeled samples and lacks similar datasets, deep neural network models with a large number of parameters will overfit. How to measure the degree of similarity among tasks has become another critical challenge, that is, the challenge of selecting tasks with large inter-class differences and small intra-class differences for training Aiming at these challenges, the deepEMD network [42], attention mechanism and reference mechanism are introduced into our proposed scheme for scene classification in remote sensing. Our proposed scheme is evaluated on three public datasets, and the experimental results show that our proposed multi-attention deepEMD network (MAEMDNet) outperforms the existing methods and obtains state-ofthe-art performance in few-shot scene classification.

THE PROPOSED SCHEME
COMPARISON WITH REFERENCE MECHANISM
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.