Abstract

Remote sensing image scene classification, which consists of labeling remote sensing images with a set of categories based on their content, has received remarkable attention for many applications such as land use mapping. Standard approaches are based on the multi-layer representation of first-order convolutional neural network (CNN) features. However, second-order CNNs have recently been shown to outperform traditional first-order CNNs for many computer vision tasks. Hence, the aim of this paper is to show the use of second-order statistics of CNN features for remote sensing scene classification. This takes the form of covariance matrices computed locally or globally on the output of a CNN. However, these datapoints do not lie in an Euclidean space but a Riemannian manifold. To manipulate them, Euclidean tools are not adapted. Other metrics should be considered such as the log-Euclidean one. This consists of projecting the set of covariance matrices on a tangent space defined at a reference point. In this tangent plane, which is a vector space, conventional machine learning algorithms can be considered, such as the Fisher vector encoding or SVM classifier. Based on this log-Euclidean framework, we propose a novel transfer learning approach composed of two hybrid architectures based on covariance pooling of CNN features, the first is local and the second is global. They rely on the extraction of features from models pre-trained on the ImageNet dataset processed with some machine learning algorithms. The first hybrid architecture consists of an ensemble learning approach with the log-Euclidean Fisher vector encoding of region covariance matrices computed locally on the first layers of a CNN. The second one concerns an ensemble learning approach based on the covariance pooling of CNN features extracted globally from the deepest layers. These two ensemble learning approaches are then combined together based on the strategy of the most diverse ensembles. For validation and comparison purposes, the proposed approach is tested on various challenging remote sensing datasets. Experimental results exhibit a significant gain of approximately 2% in overall accuracy for the proposed approach compared to a similar state-of-the-art method based on covariance pooling of CNN features (on the UC Merced dataset).

Highlights

  • The aim of a supervised classification algorithm consists of labeling an image with the corresponding class according to its content

  • In order to the benefit of pre-trained deep neural networks and second-order representations, this work aims at proposing a novel ensemble learning approach based on covariance pooling of convolutional neural network (CNN) features for remote sensing scene classification

  • We have introduced in [44] a novel hybrid approach named ELCP, which consists of an ensemble learning approach based on covariance pooling of CNN features

Read more

Summary

Introduction

The aim of a supervised classification algorithm consists of labeling an image with the corresponding class according to its content. Conventional approaches are based on encoding handcrafted features with, for example, the bag of words model (BoW) [1], the vector of locally aggregated descriptors (VLAD) [2,3] or the Fisher vectors (FV) [4,5,6]. These latter strategies have proved successful results in a wide range of applications such as image classification [4,7,8], text retrieval [9], action and face recognition [10], etc. CNN is built from various hidden layers performing different kinds of transformation, such as convolutions, pooling, and activation functions

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.