Abstract

In recent years, convolutional neural networks (CNNs) have shown great success in the scene classification of computer vision images. Although these CNNs can achieve excellent classification accuracy, the discriminative ability of feature representations extracted from CNNs is still limited in distinguishing more complex remote sensing images. Therefore, we propose a unified feature fusion framework based on attention mechanism in this paper, which is called Deep Discriminative Representation Learning with Attention Map (DDRL-AM). Firstly, by applying Gradient-weighted Class Activation Mapping (Grad-CAM) algorithm, attention maps associated with the predicted results are generated in order to make CNNs focus on the most salient parts of the image. Secondly, a spatial feature transformer (SFT) is designed to extract discriminative features from attention maps. Then an innovative two-channel CNN architecture is proposed by the fusion of features extracted from attention maps and the RGB (red green blue) stream. A new objective function that considers both center and cross-entropy loss are optimized to decrease the influence of inter-class dispersion and within-class variance. In order to show its effectiveness in classifying remote sensing images, the proposed DDRL-AM method is evaluated on four public benchmark datasets. The experimental results demonstrate the competitive scene classification performance of the DDRL-AM approach. Moreover, the visualization of features extracted by the proposed DDRL-AM method can prove that the discriminative ability of features has been increased.

Highlights

  • Classification problems have been a research hotspot in the remote sensing community over decades

  • We have proposed a novel method called Deep Discriminative Representation Learning with Attention Map (DDRL-attention map (AM)) for remote sensing scene classification

  • We addressed the problem of class ambiguity by learning more discriminative features

Read more

Summary

Introduction

Classification problems have been a research hotspot in the remote sensing community over decades. A majority of the methods are based on per-pixel classification because of the relatively low spatial resolution of remote sensing images. The object-level approach [2] has led a long way for the task of remote sensing image interpretation. 2 of 2 of approach [2] has led a long way for the task of remote sensing image interpretation This type of methodfirst firstsegments segmentsa scene a scene image into meaningful geographically based objects or superpixels method image into meaningful geographically based objects or superpixels that that share relatively homogeneous spectral, or texture information

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call