A Deep-Local-Global Feature Fusion Framework for High Spatial Resolution Imagery Scene Classification

Qiqi Zhu,Deren Li,Yanfei Zhong,Liangpei Zhang,Yanfei Liu

doi:10.3390/rs10040568

Abstract

High spatial resolution (HSR) imagery scene classification has recently attracted increased attention. The bag-of-visual-words (BoVW) model is an effective method for scene classification. However, it can only extract handcrafted features, and it disregards the spatial layout information, whereas deep learning can automatically mine the intrinsic features as well as preserve the spatial location, but it may lose the characteristic information of the HSR images. Although previous methods based on the combination of BoVW and deep learning have achieved comparatively high classification accuracies, they have not explored the combination of handcrafted and deep features, and they just used the BoVW model as a feature coding method to encode the deep features. This means that the intrinsic characteristics of these models were not combined in the previous works. In this paper, to discover more discriminative semantics for HSR imagery, the deep-local-global feature fusion (DLGFF) framework is proposed for HSR imagery scene classification. Differing from the conventional scene classification methods, which utilize only handcrafted features or deep features, DLGFF establishes a framework integrating multi-level semantics from the global texture feature–based method, the BoVW model, and a pre-trained convolutional neural network (CNN). In DLGFF, two different approaches are proposed, i.e., the local and global features fused with the pooling-stretched convolutional features (LGCF) and the local and global features fused with the fully connected features (LGFF), to exploit the multi-level semantics for complex scenes. The experimental results obtained with three HSR image classification datasets confirm the effectiveness of the proposed DLGFF framework. Compared with the published results of the previous scene classification methods, the classification accuracies of the DLGFF framework on the 21-class UC Merced dataset and 12-class Google dataset of SIRI-WHU can reach 99.76%, which is superior to the current state-of-the-art methods. The classification accuracy of the DLGFF framework on the 45-class NWPU-RESISC45 dataset, 96.37 ± 0.05%, is an increase of about 6% when compared with the current state-of-the-art methods. This indicates that the fusion of the global low-level feature, the local mid-level feature, and the deep high-level feature can provide a representative description for HSR imagery.

Highlights

The technology of satellite sensors has provided us with a large amount of high spatial resolution (HSR) images with abundant spectral and spatial information for precise land-cover/land-use (LULC) investigation
The deep-local-global feature fusion framework (DLGFF) framework has been proposed for high spatial resolution (HSR) remote sensing imagery scene classification
In DLGFF, two effective feature fusion approaches, i.e., the local and global features fused with the poolingstretched convolutional features (LGCF) and the local and global features fused with the fully connected features (LGFF), are employed for modeling the images

Summary

Introduction

The technology of satellite sensors has provided us with a large amount of high spatial resolution (HSR) images with abundant spectral and spatial information for precise land-cover/land-use (LULC) investigation. Diverse object classes, e.g., buildings, trees, and roads, with different spatial distributions can usually be found in HSR images. This makes it a challenging task to obtain the semantic information of the whole image scene, e.g., a residential scene or an industrial scene, and leads to the so-called semantic gap [4]. By extracting the local features of the scenes, scene classification based on BoVW maps the local low-level features to the corresponding parameter space to obtain the mid-level features. These mid-level features are called the “bags of visual words”.

Methods

Findings

Discussion

Conclusion