Homo–Heterogenous Transformer Learning Framework for RS Scene Classification

Jingjing Ma,Mingteng Li,Xu Tang,Fang Liu,Licheng Jiao,Xiangrong Zhang

doi:10.1109/jstars.2022.3155665

Abstract

Remote sensing (RS) scene classification plays an essential role in the RS community and has attracted increasing attention due to its wide applications. Recently, benefiting from the powerful feature learning capabilities of convolutional neural networks (CNNs), the accuracy of the RS scene classification has significantly been improved. Although the existing CNN-based methods achieve excellent results, there is still room for improvement. First, the CNN-based methods are adept at capturing the global information from RS scenes. Still, the context relationships hidden in RS scenes cannot be thoroughly mined. Second, due to the specific structure, it is easy for normal CNNs to exploit the heterogenous information from RS scenes. Nevertheless, the homogenous information, which is also crucial to comprehensively understand complex contents within RS scenes, does not get the attention it deserves. Third, most CNNs focus on establishing the relationships between RS scenes and semantic labels. However, the similarities between them are not considered deeply, which are helpful to distinguish the intra-/interclass samples. To overcome the limitations mentioned previously, we propose a homo–heterogenous transformer learning (HHTL) framework for the RS scene classification in this article. First, a patch generation module is designed to generate homogenous and heterogenous patches. Then, a dual-branch feature learning module (FLM) is proposed to mine homogenous and heterogenous information within RS scenes simultaneously. In the FLM, based on vision transformer, not only the global information but also the local areas and their context information can be captured. Finally, we design a classification module, which consists of a fusion submodule and a metric-learning module. It can integrate homo–heterogenous information and compact/separate samples from the same/different RS scene categories. Extensive experiments are conducted on four public RS scene datasets. The encouraging results demonstrate that our HHTL framework can outperform many state-of-the-art methods. Our source codes are available at the below website.

Highlights

Compared with other methods, when 10% scenes are used for training, the enhancements in overall accuracy (OA) obtained by our homo-heterogenous transformer learning (HHTL) framework are 15.88%, 15.6%, 6.99%, 2.77%, 4%, 5.95%, 0.98%, 2.11%, 2.18%, 2.06%, 1.69%, 1.24% (over MG-TABLE II OVERALL ACCURACIES AND STANDARD DEVIATIONS (%) OF THE PROPOSED HHTL FRAMEWORK AND THE COMPARISON METHODS ON
When 20% scenes are used for training, the enhancements in OA obtained by our HHTL framework are 15.73%, 14.42%, 5.03%, 2.11%, 3.4%, 6.22%, 1.79%, 4.92%, 1.32%, 1.66%, 1.76%, 0.69%, 1.26% (over MG-CAP(Sqrt-E)), 4.78%, 4.3%, and 2.82%
The improvements in OA obtained by our HHTL framework are 3.93% (EFPN-DSE-TDFF), 2.89% (GBNet+global feature), 2.02% (ACNet), 3.95% (T2T-Vision Transformer (ViT)-12), 3.6% (PiT-S), and 2.42% (PVT-Medium)

Summary

Introduction

W ITH the rapid development of remote sensing (RS) data acquisition technologies, a large number of RS images are produced every day They contain a lot of useful information, and how to use this much data to study our planet stands in the way for researchers. Hand-crafted visual features are popular since they are easy to accomplish and stable in performance Those feature descriptors can be divided into low- and mid-level representations. The popular low-level features are histogram of oriented gradients (HOG) [13], scale-invariant feature transform (SIFT) [14], and local binary pattern (LBP) [15]. They are good at capturing key points, texture, and shape information from RS scenes. The combinations of low-/mid-level features and classical classifiers complete RS scene classification tasks well, they still cannot get what we expect due to the complex contents within RS scenes, especially with the spatial resolution increases

Methods

Results

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing	Publication Date: Jan 1, 2022
Citations: 37	License type: CC BY 4.0

R Discovery Prime

Homo–Heterogenous Transformer Learning Framework for RS Scene Classification

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

Lead the way for us

Similar Papers

EMTCAL: Efficient Multiscale Transformer and Cross-Level Attention Learning for Remote Sensing Scene Classification
Xu Tang ... Jingjing Ma
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60
Xu Tang, et. al.Xu Tang ... Jingjing Ma
01 Jan 2021
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60

Resformer: Bridging Residual Network and Transformer for Remote Sensing Scene Classification
Mingteng Li ... Jingjing Ma
-
Mingteng Li, et. al.Mingteng Li ... Jingjing Ma
17 Jul 2022
17 Jul 2022

Faster and Better: A Lightweight Transformer Network for Remote Sensing Scene Classification
Xinyan Huang ... Pengfang Li
Remote Sensing | VOL. 15
Xinyan Huang, et. al.Xinyan Huang ... Pengfang Li
21 Jul 2023
Remote Sensing | VOL. 15

Noisy Remote Sensing Scene Classification via Progressive Learning Based on Multiscale Information Exploration
Xu Tang ... Xiangrong Zhang
Remote Sensing | VOL. 15
Xu Tang, et. al.Xu Tang ... Xiangrong Zhang
12 Dec 2023
Remote Sensing | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Homo–Heterogenous Transformer Learning Framework for RS Scene Classification

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing