Vision Transformer: An Excellent Teacher for Guiding Small Networks in Remote Sensing Image Scene Classification

Kejie Xu,Hong Huang,Peifang Deng

doi:10.1109/tgrs.2022.3152566

Abstract

Scene classification is an active research topic in the remote sensing community, and complex spatial layouts with various types of objects bring huge challenges to classification. Convolutional neural network (CNN)-based methods attempt to explore the global features by gradually expanding the receptive field, while long-range contextual information is ignored. Vision transformer (ViT) can extract contextual features, but the learning ability of local information is limited, and it has a large computational complexity simultaneously. In this article, an end-to-end method is exploited by employing ViT as an excellent teacher for guiding small networks (ET-GSNet) in the remote sensing image scene classification. In the ET-GSNet, ResNet18 is selected as the student model, which integrates the superiorities of the two models via knowledge distillation (KD), and the computational complexity does not increase. In the KD process, the ViT and ResNet18 are optimized together without independent pretraining, and the learning rate of teacher model gradually decreases until zero, while the weight coefficient of the KD loss module is doubled. Based on the above procedures, dark knowledge from the teacher model can be transferred to the student model more smoothly. Experimental results on the four public remote sensing datasets demonstrate that the proposed ET-GSNet method possesses the superior classification performance compared to some state-of-the-art (SOTA) methods. In addition, we evaluate the ET-GSNet on a fine-grained ship recognition dataset, and the results show that our method has good generalization for different tasks in terms of some metrics.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Vision Transformer: An Excellent Teacher for Guiding Small Networks in Remote Sensing Image Scene Classification

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society

Lead the way for us

Journal: IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society	Publication Date: Jan 1, 2022
Citations: 36

Similar Papers

Remote Scene Image Scene Classification Based on Adaptive Segmentation and Dynamic Graph Convolution
Yuqun Yang ... Jingjing Ma
-
Yuqun Yang, et. al.Yuqun Yang ... Jingjing Ma
11 Jul 2021
11 Jul 2021

Hierarchical Attention and Bilinear Fusion for Remote Sensing Image Scene Classification
Donghang Yu ... Jun Lu
IEEE journal of selected topics in applied earth observations and remote sensing | VOL. 13
Donghang Yu, et. al.Donghang Yu ... Jun Lu
01 Jan 2020
IEEE journal of selected topics in applied earth observations and remote sensing | VOL. 13

Scene Classification Through Knowledge Distillation Enabled Parameter-Free Attention Model for Remote Sensing Images
Yubing Han ... Anming Dong
-
Yubing Han, et. al.Yubing Han ... Anming Dong
01 Dec 2022
01 Dec 2022

Representation Learning of Remote Sensing Knowledge Graph for Zero-Shot Remote Sensing Image Scene Classification
Yansheng Li ... Deyu Kong
-
Yansheng Li, et. al.Yansheng Li ... Deyu Kong
11 Jul 2021
11 Jul 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Vision Transformer: An Excellent Teacher for Guiding Small Networks in Remote Sensing Image Scene Classification

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society