Abstract

Scene classification is one of the bases for automatic remote sensing image interpretation. Recently, deep convolutional neural networks have presented promising performance in high-resolution remote sensing scene classification research. In general, most researchers directly use raw deep features extracted from the convolutional networks to classify scenes. However, this strategy only considers single scale features, which cannot describe both the local and global features of images. In fact, the dissimilarity of scene targets in the same category may result in convolutional features being unable to classify them into the same category. Besides, the similarity of the global features in different categories may also lead to failure of fully connected layer features to distinguish them. To address these issues, we propose a scene classification method based on multi-scale deep feature representation (MDFR), which mainly includes two contributions: (1) region-based features selection and representation; and (2) multi-scale features fusion. Initially, the proposed method filters the multi-scale deep features extracted from pre-trained convolutional networks. Subsequently, these features are fused via two efficient fusion methods. Our method utilizes the complementarity between local features and global features by effectively exploiting the features of different scales and discarding the redundant information in features. Experimental results on three benchmark high-resolution remote sensing image datasets indicate that the proposed method is comparable to some state-of-the-art algorithms.

Highlights

  • A large amount of high-resolution remote sensing images have been applied to urban functional analysis, geographic image retrieval, environment monitoring, and so on [1]

  • Scene classification is a fundamental work among the interpretation tasks of remote sensing images, which directly affects the subsequent interpretation of remote sensing images

  • To obtain discriminative feature representation, most classification methods have focused on the research of various feature descriptors, such as histograms of oriented gradients (HOG) [3], scale-invariant feature transform (SIFT) [4], local binary patterns (LBP) [5], and GIST [6]

Read more

Summary

Introduction

A large amount of high-resolution remote sensing images have been applied to urban functional analysis, geographic image retrieval, environment monitoring, and so on [1]. To obtain discriminative feature representation, most classification methods have focused on the research of various feature descriptors, such as histograms of oriented gradients (HOG) [3], scale-invariant feature transform (SIFT) [4], local binary patterns (LBP) [5], and GIST [6] These feature descriptors have been extensively used in computer vision classification tasks. Lazebnik et al [9] proposed a spatial pyramid model, which divided an image into different levels of the pyramid and cascaded each level of BoVW to form the final feature representation. These methods only considered the absolute spatial relationship between visual words. The literature [18,19] proposed a multi-feature topic model, which built topic models for different features and fused them at visual topic level

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call