CSDS: End-to-End Aerial Scenes Classification With Depthwise Separable Convolution and an Attention Mechanism

Xinyu Wang,Liming Yuan,Xianbin Wen,Haixia Xu

doi:10.1109/jstars.2021.3117857

Abstract

Compared with natural scenes, aerial scenes are usually composed of numerous objects densely distributed within the aerial view, and thus, more key local semantic features are needed to describe them. However, when existing CNNs are used for remote sensing image classification, they typically focus on the global semantic features of the image, and especially for deep models, shallow and intermediate features are easily lost. This article proposes a channel–spatial attention mechanism based on a depthwise separable convolution (CSDS) network for aerial scene classification to solve these challenges. First, we construct a depthwise separable convolution (DS-Conv) and pyramid residual connection architecture. DS-Conv extracts features from each channel and merges them, effectively reducing the number of necessary calculations, and the pyramid residual connections connect the features from multiple layers and create associations. Then, the channel–spatial attention algorithm causes the model to obtain more effective features in the channel and spatial domains. Finally, an improved cross-entropy loss function is used to reduce the impact of similar categories on backpropagation. Comparative experiments on three public datasets show that the CSDS network can achieve results comparable to those of other state-of-the-art methods. In addition, visualization of feature extraction results by the Grad-CAM algorithm and ablation experiments for each module reflect the powerful feature learning and representation capabilities of the proposed CSDS network.

Highlights

R EMOTE sensing and earth observation, called earth vision, are important branches and applications of computer vision and image understanding [1]–[3]
In remote sensing image classification tasks, the UC Merced (UCM) dataset contains 2100 labeled samples, only half of which may be used for training, and is characterized by an uneven sample distribution
The overall accuracy (OA), average accuracy (AA), Kappa coefficient (Kappa), F1 score (F1), and confusion matrix (CM) are used in the experiment to describe the performance of the proposed CSDS network

Summary

INTRODUCTION

R EMOTE sensing and earth observation, called earth vision, are important branches and applications of computer vision and image understanding [1]–[3]. 1) Useless Background Information: The key object of the sample usually determines the label of the remote sensing image. To highlight the key objects and suppress redundant background information, local key features must be extracted to enhance the semantic representation of the aerial image. The main direction angle of the key object in an aerial scene image can change greatly (see Fig. 1 (b)). Due to the large shooting height and angle of aerial scenes, the distribution of key objects is different from the central distribution observed in natural scene images (see Fig. 1). These characteristics increase the difficulty in understanding remote sensing images. Traditional CNNs tend to focus on global semantics, making it difficult to extract the key features of aerial scenes, which may reduce the ability to represent the scene and make it impossible to be accurately classified [11]

Motivation and Objectives

Aerial Scene Classification

Depthwise Separable Convolution

Attention Mechanisms

Feature Extraction Backbone

Dataset Description

Experimental Details

Accuracy Evaluation Indices

Experimental Results

Method

METHODS

CSDS ablation experiment

Findings

Attention Maps on CSDS

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing	Publication Date: Jan 1, 2021
Citations: 22	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

CSDS: End-to-End Aerial Scenes Classification With Depthwise Separable Convolution and an Attention Mechanism

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

Lead the way for us

Similar Papers

A Deep Scene Representation for Aerial Scene Classification
Xiangtao Zheng ... Yuan Yuan
IEEE Transactions on Geoscience and Remote Sensing | VOL. 57
Xiangtao Zheng, et. al.Xiangtao Zheng ... Yuan Yuan
01 Jul 2019
IEEE Transactions on Geoscience and Remote Sensing | VOL. 57

Moth image segmentation based on improved Unet
Qilin Sun ... Jiandong Hu
-
Qilin Sun, et. al.Qilin Sun ... Jiandong Hu
18 Oct 2022
18 Oct 2022

A Local–Global Interactive Vision Transformer for Aerial Scene Classification
Ting Peng ... Yuan Fang
IEEE Geoscience and Remote Sensing Letters | VOL. 20
Ting Peng, et. al.Ting Peng ... Yuan Fang
01 Jan 2023
IEEE Geoscience and Remote Sensing Letters | VOL. 20

AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification
Gui-Song Xia ... Liangpei Zhang
IEEE Transactions on Geoscience and Remote Sensing | VOL. 55
Gui-Song Xia, et. al.Gui-Song Xia ... Liangpei Zhang
01 Jul 2017
IEEE Transactions on Geoscience and Remote Sensing | VOL. 55

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CSDS: End-to-End Aerial Scenes Classification With Depthwise Separable Convolution and an Attention Mechanism

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing