Abstract

The remote sensing scene images classification has been of great value to civil and military fields. Deep learning models, especially the convolutional neural network (CNN), have achieved great success in this task, however, they may suffer from two challenges: firstly, the sizes of the category objects are usually different, but the conventional CNN extracts the features with fixed convolution extractor which could cause the failure in learning the multi-scale features; secondly, some image regions may not be useful during the feature learning process, therefore, how to guide the network to select and focus on the most relevant regions is crucially vital for remote sensing scene image classification. To address these two challenges, we propose a multi-scale attention network (MSA-Network), which integrates a multi-scale (MS) module and a channel and position attention (CPA) module to boost the performance of the remote sensing scene classification. The proposed MS module learns multi-scale features by adopting various sizes of sliding windows from different depths layers and receptive fields. The CPA module is composed of two parts: the channel attention (CA) module and position attention (PA) one. The CA module learns the global attention features from channel-level, and the PA module extracts the local attention features from pixel-level. Thus, fusing both of those two attention features, the network is apt to focus on the more critical and salient regions automatically. Extensive experiments on UC Merced, AID, NWPU-RESISC45 datasets demonstrate that the proposed MSA-Network outperforms several state-of-the-art methods.

Highlights

  • The explosion of high-resolution remote sensing imaging technology has unleashed a veritable data deluge in investigating the land-use and land-cover scenes [9], [16], [19], [18]

  • For better extracting more multi-scale and discriminative features, we propose an MS module and a channel and position attention (CPA) module

  • We suggest that adding the channel attention (CA) module, could guide the model to focus on more global regions while adding the position attention (PA) module, tends to guide the model to focus on more subtle regions, fusing both the CA module and PA module could efficiently improve the model ability to learn more crucial and salient features

Read more

Summary

Introduction

The explosion of high-resolution remote sensing imaging technology has unleashed a veritable data deluge in investigating the land-use and land-cover scenes [9], [16], [19], [18]. The recognition and classification of the remote sensing scene images have been of great value to civil and military fields, due to its plentiful spatial and semantic information. In this classification task, it uses pixel-based [3] or different levels of features to identify and label the images based on the image contents. The Guokai Zhang, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai China.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call