Abstract

While growing instruments generate more and more airborne or satellite images, the bottleneck in remote sensing (RS) scene classification has shifted from data limits toward a lack of ground truth samples. There are still many challenges when we are facing unknown environments, especially those with insufficient training data. Few-shot classification offers a different picture under the umbrella of meta-learning: digging rich knowledge from a few data are possible. In this work, we propose a method named RS-SSKD for few-shot RS scene classification from a perspective of generating powerful representation for the downstream meta-learner. Firstly, we propose a novel two-branch network that takes three pairs of original-transformed images as inputs and incorporates Class Activation Maps (CAMs) to drive the network mining, the most relevant category-specific region. This strategy ensures that the network generates discriminative embeddings. Secondly, we set a round of self-knowledge distillation to prevent overfitting and boost the performance. Our experiments show that the proposed method surpasses current state-of-the-art approaches on two challenging RS scene datasets: NWPU-RESISC45 and RSD46-WHU. Finally, we conduct various ablation experiments to investigate the effect of each component of the proposed method and analyze the training time of state-of-the-art methods and ours.

Highlights

  • Scene classification is one of the most fundamental tasks in the remote sensing community, it plays a vital role in semantic understanding of remote sensing (RS) scenes

  • We propose a self-supervised knowledge distillation (SSKD) module that strives to learn a powerful embedding for the downstream meta learner

  • We verify the effectiveness of the proposed RS-SSKD method on two datasets, NWPURESISC45 [22] and RSD46-WHU [34]

Read more

Summary

Introduction

Scene classification is one of the most fundamental tasks in the remote sensing community, it plays a vital role in semantic understanding of remote sensing (RS) scenes. It provides significant support for various important applications and societal needs, including urban planning [1], land-cover analysis [2], environmental monitoring [3], deforestation mapping [4], air pollution prediction [5], etc. The methods using handcrafted features [10,11,12,13,14,15,16,17] have been the leading approach in earlier years; they require hand design features and lack adaptability This method family performs poorly for complex scenes or massive data and has been replaced by deep learning methods

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call