Remote Sensing Scene Classification Research Articles

Remote sensing image classification (RSIC) is a classical and fundamental task in the intelligent interpretation of remote sensing imagery, which can provide unique labeling information for each acquired remote sensing image. Thanks to the potent global context information extraction ability of the multi-head self-attention (MSA) mechanism, visual transformer (ViT)-based architectures have shown excellent capability in natural scene image classification. However, in order to achieve powerful RSIC performance, it is insufficient to capture global spatial information alone. Specifically, for fine-grained target recognition tasks with high inter-class similarity, discriminative and effective local feature representations are key to correct classification. In addition, due to the lack of inductive biases, the powerful global spatial context representation capability of ViT requires lengthy training procedures and large-scale pre-training data volume. To solve the above problems, a hybrid architecture of convolution neural network (CNN) and ViT is proposed to improve the RSIC ability, called P2FEViT, which integrates plug-and-play CNN features with ViT. In this paper, the feature representation capabilities of CNN and ViT applying for RSIC are first analyzed. Second, aiming to integrate the advantages of CNN and ViT, a novel approach embedding CNN features into the ViT architecture is proposed, which can make the model synchronously capture and fuse global context and local multimodal information to further improve the classification capability of ViT. Third, based on the hybrid structure, only a simple cross-entropy loss is employed for model training. The model can also have rapid and comfortable convergence with relatively less training data than the original ViT. Finally, extensive experiments are conducted on the public and challenging remote sensing scene classification dataset of NWPU-RESISC45 (NWPU-R45) and the self-built fine-grained target classification dataset called BIT-AFGR50. The experimental results demonstrate that the proposed P2FEViT can effectively improve the feature description capability and obtain outstanding image classification performance, while significantly reducing the high dependence of ViT on large-scale pre-training data volume and accelerating the convergence speed. The code and self-built dataset will be released at our webpages.

Read full abstract

Deep neural networks have achieved promising progress in remote sensing (RS) image classification, for which the training process requires abundant samples for each class. However, it is time-consuming and unrealistic to annotate labels for each RS category, given the fact that the RS target database is increasing dynamically. Zero-shot learning (ZSL) allows for identifying novel classes that are not seen during training, which provides a promising solution for the aforementioned problem. However, previous ZSL models mainly depend on manually-labeled attributes or word embeddings extracted from language models to transfer knowledge from seen classes to novel classes. Those class embeddings may not be visually detectable and the annotation process is time-consuming and labor-intensive. Besides, pioneer ZSL models use convolutional neural networks pre-trained on ImageNet, which focus on the main objects appearing in each image, neglecting the background context that also matters in RS scene classification. To address the above problems, we propose to collect visually detectable attributes automatically. We predict attributes for each class by depicting the semantic-visual similarity between attributes and images. In this way, the attribute annotation process is accomplished by machine instead of human as in other methods. Moreover, we propose a Deep Semantic-Visual Alignment (DSVA) that take advantage of the self-attention mechanism in the transformer to associate local image regions together, integrating the background context information for prediction. The DSVA model further utilizes the attribute attention maps to focus on the informative image regions that are essential for knowledge transfer in ZSL, and maps the visual images into attribute space to perform ZSL classification. With extensive experiments, we show that our model outperforms other state-of-the-art models by a large margin on a challenging large-scale RS scene classification benchmark. Moreover, we qualitatively verify that the attributes annotated by our network are both class discriminative and semantic related, which benefits the zero-shot knowledge transfer.

Read full abstract

Remote Sensing Scene Classification Research Articles

Related Topics

Articles published on Remote Sensing Scene Classification

Self-supervised embedding for generalized zero-shot learning in remote sensing scene classification

Faster and Better: A Lightweight Transformer Network for Remote Sensing Scene Classification

Real-time scene classification of unmanned aerial vehicles remote sensing image based on Modified GhostNet.

Ebola optimization with modified DarkNet‐53 model for scene classification and security on Internet of Things in smart cities

A Lightweight Dual-Branch Swin Transformer for Remote Sensing Scene Classification

A lightweight and stochastic depth residual attention network for remote sensing scene classification

MGML: Multigranularity Multilevel Feature Ensemble Network for Remote Sensing Scene Classification.

Few-shot remote sensing image scene classification based on multiscale covariance metric network (MCMNet)

Dual Wavelet Attention Networks for Image Classification

P2FEViT: Plug-and-Play CNN Feature Embedded Hybrid Vision Transformer for Remote Sensing Image Classification

Improving remote sensing scene classification using quality-based data augmentation

Deep Semantic-Visual Alignment for zero-shot remote sensing image scene classification

Credible Remote Sensing Scene Classification Using Evidential Fusion on Aerial-Ground Dual-View Images

Subspace prototype learning for few-Shot remote sensing scene classification

A coupled multi-task feature boosting method for remote sensing scene classification

Adversarial Remote Sensing Scene Classification Based on Lie Group Feature Learning

HCFPN: Hierarchical Contextual Feature-Preserved Network for Remote Sensing Scene Classification

Dictionary Learning for Few-Shot Remote Sensing Scene Classification

BiShuffleNeXt: A lightweight bi-path network for remote sensing scene classification

Adaptive Discriminative Regions Learning Network for Remote Sensing Scene Classification.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Remote Sensing Scene Classification Research Articles

Related Topics

Articles published on Remote Sensing Scene Classification

Self-supervised embedding for generalized zero-shot learning in remote sensing scene classification

Faster and Better: A Lightweight Transformer Network for Remote Sensing Scene Classification

Real-time scene classification of unmanned aerial vehicles remote sensing image based on Modified GhostNet.

Ebola optimization with modified DarkNet‐53 model for scene classification and security on Internet of Things in smart cities

A Lightweight Dual-Branch Swin Transformer for Remote Sensing Scene Classification

A lightweight and stochastic depth residual attention network for remote sensing scene classification

MGML: Multigranularity Multilevel Feature Ensemble Network for Remote Sensing Scene Classification.

Few-shot remote sensing image scene classification based on multiscale covariance metric network (MCMNet)

Dual Wavelet Attention Networks for Image Classification

P2FEViT: Plug-and-Play CNN Feature Embedded Hybrid Vision Transformer for Remote Sensing Image Classification

Improving remote sensing scene classification using quality-based data augmentation

Deep Semantic-Visual Alignment for zero-shot remote sensing image scene classification

Credible Remote Sensing Scene Classification Using Evidential Fusion on Aerial-Ground Dual-View Images

Subspace prototype learning for few-Shot remote sensing scene classification

A coupled multi-task feature boosting method for remote sensing scene classification

Adversarial Remote Sensing Scene Classification Based on Lie Group Feature Learning

HCFPN: Hierarchical Contextual Feature-Preserved Network for Remote Sensing Scene Classification

Dictionary Learning for Few-Shot Remote Sensing Scene Classification

BiShuffleNeXt: A lightweight bi-path network for remote sensing scene classification

Adaptive Discriminative Regions Learning Network for Remote Sensing Scene Classification.