Abstract

Significant progress has been achieved in remote sensing image scene classification (RSISC) with the development of convolutional neural networks (CNNs) and vision transformers (ViT). However, high intra-class diversity and inter-class similarity are still enormous challenges for RSISC. Metric learning can effectively improve the discriminative ability of deep representations by constraining the distance between features. Previous metric learning methods only optimize the feature space representation through metric function, ignoring the information interaction between samples. For complex scene images, similarity and discriminative knowledge need to be summarized from the multiple positive and negative pairs. We propose a novel efficient multi-sample contrastive network (EMSCNet) to integrate knowledge from multiple samples. Specifically, we construct a dynamic dictionary with momentum updates to mine positive and negative pairs from the entire dataset. Then, the similarity and discriminative knowledge between samples are summarized by introducing a contrastive module. Finally, the knowledge of the contrastive module is transferred to the backbone classifier through knowledge distillation. The proposed contrastive module can be easily embedded into the training process of CNNs or ViT and removed during inference. Experimental results conducted on three datasets demonstrate the effectiveness of the proposed approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call