Abstract

Abstract. Learning efficient image representations is at the core of the classification task of remote sensing imagery. The existing methods for solving image classification task, based on either feature coding approaches extracted from convolution neural networks(CNNs) or training new CNNs, can only generate image features with limited representative ability, which essentially prevents them from achieving better performance. In this paper, we investigate how to transfer features from these successfully pre-trained CNNs for classification. We propose a scenario for generating image features via cascading features extracted from different CNNs. First, pre-trained CNNs, like CaffeNet, VGG-S and VGG-F, are used as feature extractor since their different structures help extract richer information of images. Then the fully-connected layers of the pre-trained CNNs are fine-tuned with UC Merced land use dataset. Finally, the image features generating from cascading the outputs of three networks above, are fed into multi-class Optimal Margin Distribution Machine (mcODM) to obtain the final classification results. Extensive experiments on public land use classification dataset demonstrates that the image features obtained by the proposed scenario can result in remarkable performance and improve the state-of-the-art by a significant margin. The results reveal that the features from pre-trained CNNs generalize well to land use dataset and are more expressive than features from single CNN.

Highlights

  • The division of land use types has complex natural and social attributes, which makes meeting the user's need for classification of land use become a hot and difficult problem in the field of land resource management

  • Based on the above research, this paper proposes a method of multi-structure convolutional neural network features cascading (MCNNFC) for land use classification

  • Due to the difference of network architecture, the range of increase is different, among which CaffeNet has the highest increase, 5.71%, VGG-F has increased 5.1%, VGG-S has the lowest increase, 2.72%; MCNNFC still has the highest classification accuracy after fine-tuning, reaching 97.55%

Read more

Summary

Introduction

The division of land use types has complex natural and social attributes, which makes meeting the user's need for classification of land use become a hot and difficult problem in the field of land resource management. Low-level features are based on visual attributes (texture, structure, spatial information, etc.), such as scale invariant feature transform (SIFT) (Lowe et al, 2004). It can achieve good classification results for general classification tasks, the limitations of its poor generalization ability are exposed for classification tasks with many kinds of scenes and high complexity. Some researchers combine pre-trained CNNs as feature extractors with traditional coding methods, such as bag of visual words(BOVW),improved Fisher vector(IFK), which improves the classification accuracy to a certain extent. (2) Fine-tuning pre-training CNNs. Some researchers combine pre-trained CNNs as feature extractors with traditional coding methods, such as bag of visual words(BOVW),improved Fisher vector(IFK), which improves the classification accuracy to a certain extent. This training method can effectively avoid the problem of overfitting, but there are some problems in the application, such as long training cycle and large demand for training data

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call