Abstract

Land-use classification from remote sensing images has become an important but challenging task. This paper proposes Hierarchical Coding Vectors (HCV), a novel representation based on hierarchically coding structures, for scene level land-use classification. We stack multiple Bag of Visual Words (BOVW) coding layers and one Fisher coding layer to develop the hierarchical feature learning structure. In BOVW coding layers, we extract local descriptors from a geographical image with densely sampled interest points, and encode them using soft assignment (SA). The Fisher coding layer encodes those semi-local features with Fisher vectors (FV) and aggregates them to develop a final global representation. The graphical semantic information is refined by feeding the output of one layer into the next computation layer. HCV describes the geographical images through a high-level representation of richer semantic information by using a hierarchical coding structure. The experimental results on the 21-Class Land Use (LU) and RSSCN7 image databases indicate the effectiveness of the proposed HCV. Combined with the standard FV, our method (FV + HCV) achieves superior performance compared to the state-of-the-art methods on the two databases, obtaining the average classification accuracy of 91.5% on the LU database and 86.4% on the RSSCN7 database.

Highlights

  • Scene level land-use classification aims to assign a semantic label to a remote sensing image according to its content

  • Inspired by the success of Deep neural networks (DNNs) in computer vision applications and encoding methods for remote sensing applications, we proposed Hierarchical Coding Vectors (HCV), a new representation based on hierarchically coding structures, for scene level land-use classification

  • We evaluate the effectiveness of the proposed HCV framework and traditional Fisher vectors (FV) for remote sensing land-use scene classification using two standard public databases, the 21-class Land Use (LU)

Read more

Summary

Introduction

Scene level land-use classification aims to assign a semantic label (e.g., building and river) to a remote sensing image according to its content. It is necessary to develop effective and efficient scene classification methods to annotate the massive remote sensing images. The Bag of Visual Words (BOVW) [1,2] framework and its variants [3,4] based on spatial relations have become promising remote sensing image representations for land-use classification. For BOVW, we usually extract local features from the geographical images, learn a codebook in the training set by K-means or Gaussian mixture model (GMM), encode the local features and pool them to a vector, and normalize this vector as the final global representation. The representation is subsequently fed into a pre-trained classifier to obtain the annotation result for remote sensing images

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.