Abstract

It is hard to use a single label to describe an image for the complexity of remote sensing scenes. Thus, it is a more general and practical choice to use multilabel image classification for high-resolution remote sensing (HRS) images. How to construct the relation between categories is a vital problem for multilabel classification. Some researchers use the recurrent neural network (RNN) or long short-term memory (LSTM) to exploit label relations over the last years. However, the RNN or LSTM could model such category dependence in a chain propagation manner. The performance of the RNN/LSTM might be questioned when a specific category is improperly inferred. To address this, we propose a novel HRS image multilabel classification network, transformer-driven semantic relation inference network. The network comprises two modules: semantic sensitive module (SSM) and semantic relation-building module (SRBM). The SSM locates the semantic attentional regions in the features extracted by a deep convolutional neural network and generates a discriminative content-aware category representation (CACR). The SRBM uses label relation inference from outputs of the SSM to predict final results. The characteristic of the proposed method is that it can extract semantic attentional regions relevant to the category and generate a discriminative CACR and natural and interpretable reasoning about label relations. Experiments were performed on the public UCM multilabel and MLRSNet datasets. Quantitative and qualitative analyses on state-of-the-art multilabel benchmarks proved that the proposed method could effectively locate semantic regions and build relationships between categories with better robustness.

Highlights

  • O WING to the complexity of remote sensing scenes, using a single label to describe an image for the complexity of remote sensing scenes is hard

  • (1) The significant contribution of this paper is that we introduce a novel transformer-driven relation reasoning from content-aware category representation (CACR) for multilabel high-resolution remote sensing (HRS) image classification

  • The proposed method significantly improved on the VGG model because the VGG model is not as deep as the network structure of residual network (ResNet) and DenseNet, so the high-level semantic information in the features extracted by VGG is less than those extracted by ResNet and DenseNet

Read more

Summary

Introduction

O WING to the complexity of remote sensing scenes, using a single label to describe an image for the complexity of remote sensing scenes is hard. Multilabel image classification for high-resolution remote sensing (HRS) images is more general and practical than single-label image classification. HRS scenes comprise various categories with correlations and differences between them. K. Wang is School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China. J. Zhu is National Bio Energy Co., Ltd., 16 Luomashi st, Xicheng District, Beijing 100052, China

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call