Abstract

Scene classification of high-resolution Remote Sensing Images (RSI) is one of basic challenges in RSI interpretation. Existing scene classification methods based on deep learning have achieved impressive performances. However, since RSI commonly contain various types of ground objects and complex backgrounds, most of methods cannot focus on saliency features of scene, which limits the classification performances. To address this issue, we propose a novel Saliency Dual Attention Residual Network (SDAResNet) to extract both cross-channel and spatial saliency information for scene classification of RSI. More specifically, the proposed SDAResNet consists of spatial attention and channel attention, in which spatial attention is embedded in low-level feature to emphasize saliency location information and suppress background information, and channel attention is integrated to high-level features to extract saliency meaningful information. Additionally, several image classification tricks are used to further improve classification accuracy. Finally, Extensive experiments on two challenging benchmark RSI datasets are presented to demonstrate that our methods outperform most of state-of-the-art approaches significantly.

Highlights

  • With the rapid development of remote sensing technology and satellite sensors, a great number of high-resolution Remote Sensing Images (RSI) have become readily available [1]–[3]

  • SDAResNet without any tricks is better than other methods under training ratio of 20%, and the result under training ratio of 50% is comparable to global-local attention network (GLANet) (SVM)

  • The confusion matrix generated by the best combination of proposed SDAResNet and effective tricks under training ratio of 20% and 50% on PatternNet dataset are shown in Figure 7-8, respectively

Read more

Summary

Introduction

With the rapid development of remote sensing technology and satellite sensors, a great number of high-resolution RSI have become readily available [1]–[3]. Scene classification of RSI, i.e. automatically extracting valuable information from each scene image and categorizing them into different classes based on their semantic information, has become a research hotspot in RSI interpretation [1], [4], [5]. High-resolution RSI is quite different from natural images due to their unique imaging perspective and capture mode, which results in images with various types of ground. With the fast development of Convolutional Neural Network (CNN), a variety of CNN-based methods have been dominating the field of scene classification mainly due to its capacity to learn hierarchical representation to describe the image scenes [5], [8], [9]. RSI commonly contain various types of ground objects and complex backgrounds, but not all objects are useful for scene

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call