Abstract
Scene classification of high-resolution remote sensing images (HRRSI) is one of the most important means of land-cover classification. Deep learning techniques, especially the convolutional neural network (CNN) have been widely applied to the scene classification of HRRSI due to the advancement of graphic processing units (GPU). However, they tend to extract features from the whole images rather than discriminative regions. The visual attention mechanism can force the CNN to focus on discriminative regions, but it may suffer from the influence of intra-class diversity and repeated texture. Motivated by these problems, we propose an attention-based deep feature fusion (ADFF) framework that constitutes three parts, namely attention maps generated by Gradient-weighted Class Activation Mapping (Grad-CAM), a multiplicative fusion of deep features and the center-based cross-entropy loss function. First of all, we propose to make attention maps generated by Grad-CAM as an explicit input in order to force the network to concentrate on discriminative regions. Then, deep features derived from original images and attention maps are proposed to be fused by multiplicative fusion in order to consider both improved abilities to distinguish scenes of repeated texture and the salient regions. Finally, the center-based cross-entropy loss function that utilizes both the cross-entropy loss and center loss function is proposed to backpropagate fused features so as to reduce the effect of intra-class diversity on feature representations. The proposed ADFF architecture is tested on three benchmark datasets to show its performance in scene classification. The experiments confirm that the proposed method outperforms most competitive scene classification methods with an average overall accuracy of 94% under different training ratios.
Highlights
RIt is increasingly significant to use high-resolution remote sensing images (HRRSI) in geospatial object detection [1,2] or land-cover classification tasks due to the advance of remote sensing instruments
E As shown in Figure 4, the proposed attention-based deep feature fusion (ADFF) approach consists of three novel components, namely the network that generates attention maps by the Grad-CAM, a multiplicative fusion of deep features and the center-based cross-entropy loss function
In order to evaluate the performance of the ADFF framework, three datasets, including UC Merced, Aerial Image Dataset (AID) and NWPU-NESISC45 dataset, are used
Summary
RIt is increasingly significant to use high-resolution remote sensing images (HRRSI) in geospatial object detection [1,2] or land-cover classification tasks due to the advance of remote sensing instruments. It is difficult to classify the scene images effectively due to various land-cover objects and high intra-class diversity [7,8]. Features that are used to describe the scene images are mainly divided into three types by [9]. They include handcrafted features, unsupervised learning-based features and deep features. The deep feature-based methods have achieved great success in scene classification, they assume that each object contributes to the feature representations of a scene image [10,11,12]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.