Abstract
High spatial resolution (HSR) image scene classification is aimed at bridging the semantic gap between low-level features and high-level semantic concepts, which is a challenging task due to the complex distribution of ground objects in HSR images. Scene classification based on the bag-of-visual-words (BOVW) model is one of the most successful ways to acquire the high-level semantic concepts. However, the BOVW model assigns local low-level features to their closest visual words in the “visual vocabulary” (the codebook obtained by k-means clustering), which discards too many useful details of the low-level features in HSR images. In this paper, a feature coding method under the Fisher kernel (FK) coding framework is introduced to extend the BOVW model by characterizing the low-level features with a gradient vector instead of the count statistics in the BOVW model, which results in a significant decrease in the codebook size and an acceleration of the codebook learning process. By considering the differences in the distributions of the ground objects in different regions of the images, local FK (LFK) is proposed for the HSR image scene classification method. The experimental results show that the proposed scene classification methods under the FK coding framework can greatly reduce the computational cost, and can obtain a better scene classification accuracy than the methods based on the traditional BOVW model.
Highlights
A large amount of high spatial resolution (HSR) images are available for precise land-use/land-cover investigation
The scene classification methods developed under the Fisher kernel (FK) coding framework, both with and without the incorporation of the spatial information, are called FK-S and FK-O, respectively
In contrast to the spatial pyramid matching (SPM)-MeanStd method, FK-O and FK-S increased the accuracy by about 6%, 4%, and 2% for the UC Merced (UCM)
Summary
A large amount of high spatial resolution (HSR) images are available for precise land-use/land-cover investigation. The improvement of the spatial resolution of remote sensing images (less than 1 m) enables the analysis of the structure of ground objects. A lot of research has been undertaken on accurate ground object recognition (e.g., trees, buildings, roads) in HSR images [1,2,3,4,5,6,7,8]. To bridge the semantic gap, scene classification methods based on the bag-of-visual-words (BOVW). In scene classification based on the BOVW model, the low-level features are extracted from the image by a local feature extraction method, e.g., mean/standard deviation statistics [9], the gray-level co-occurrence matrix [24], or scale invariant feature transform [25], and the low-level features are assigned to their closest visual words in a “visual vocabulary”, which is Remote Sens. In scene classification based on the BOVW model, the low-level features are extracted from the image by a local feature extraction method, e.g., mean/standard deviation statistics [9], the gray-level co-occurrence matrix [24], or scale invariant feature transform [25], and the low-level features are assigned to their closest visual words in a “visual vocabulary”, which is Remote Sens. 2016, 8, 157; doi:10.3390/rs8020157 www.mdpi.com/journal/remotesensing
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.