Abstract

With the development of computer vision, attention mechanisms have been widely studied. Although the introduction of an attention module into a network model can help to improve classification performance on remote sensing scene images, the direct introduction of an attention module can increase the number of model parameters and amount of calculation, resulting in slower model operations. To solve this problem, we carried out the following work. First, a channel attention module and spatial attention module were constructed. The input features were enhanced through channel attention and spatial attention separately, and the features recalibrated by the attention modules were fused to obtain the features with hybrid attention. Then, to reduce the increase in parameters caused by the attention module, a group-wise hybrid attention module was constructed. The group-wise hybrid attention module divided the input features into four groups along the channel dimension, then used the hybrid attention mechanism to enhance the features in the channel and spatial dimensions for each group, then fused the features of the four groups along the channel dimension. Through the use of the group-wise hybrid attention module, the number of parameters and computational burden of the network were greatly reduced, and the running time of the network was shortened. Finally, a lightweight convolutional neural network was constructed based on the group-wise hybrid attention (LCNN-GWHA) for remote sensing scene image classification. Experiments on four open and challenging remote sensing scene datasets demonstrated that the proposed method has great advantages, in terms of classification accuracy, even with a very low number of parameters.

Highlights

  • Introduction published maps and institutional affilIn recent years, convolutional neural networks (CNNs) have achieved excellent performance in many fields [1,2,3,4,5,6,7]

  • Based on the SE module, we propose a channel attention module which is more suitable for remote sensing scene image classification

  • It can be seen that when the training ratio was 20%, our method achieved the best performance with the least parameters, and the classification accuracy reached 93.85%, which is 0.58% higher than that of Inception V3 [55], and 1.46% higher than that of ResNet50 [55]

Read more

Summary

Introduction

Convolutional neural networks (CNNs) have achieved excellent performance in many fields [1,2,3,4,5,6,7]. Improving the quality of spatial coding of the whole feature level of a convolutional neural network to enhance the representation ability of the network is an effective way to improve the performance of the network. It has been shown, with VGGNet [12], that increasing the depth of the network can significantly improve the performance of the network. ResNet [13] addressed the problem of performance degradation caused by network deepening: it expanded the network depth to 150 iations

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call