Abstract
Abstract. Scene classification plays an important role in remote sensing field. Traditional approaches use high-resolution remote sensing images as data source to extract powerful features. Although these kind of methods are common, the model performance is severely affected by the image quality of the dataset, and the single modal (source) of images tend to cause the mission of some scene semantic information, which eventually degrade the classification accuracy. Nowadays, multi-modal remote sensing data become easy to obtain since the development of remote sensing technology. How to carry out scene classification of cross-modal data has become an interesting topic in the field. To solve the above problems, this paper proposes using feature fusion for cross-modal scene classification of remote sensing image, i.e., aerial and ground street view images, expecting to use the advantages of aerial images and ground street view data to complement each other. Our cross- modal model is based on Siamese Network. Specifically, we first train the cross-modal model by pairing different sources of data with aerial image and ground data. Then, the trained model is used to extract the deep features of the aerial and ground image pair, and the features of the two perspectives are fused to train a SVM classifier for scene classification. Our approach has been demonstrated using two public benchmark datasets, AiRound and CV-BrCT. The preliminary results show that the proposed method achieves state-of-the-art performance compared with the traditional methods, indicating that the information from ground data can contribute to aerial image classification.
Highlights
Scene classification is a hot topic in remote sensing field, which aims to assign a semantic category to the image according to its content, and is the most intuitive understanding of remote sensing image
To solve the above problems, this paper proposes a method based on cross modal model fusion features, which combines the air and ground perspectives, uses the advantages of aerial images and ground street view data to complement each other
The cross-modal model is based on Siamese network Siamese network, which consists of two neural networks to form the whole Siamese structure
Summary
Scene classification is a hot topic in remote sensing field, which aims to assign a semantic category to the image according to its content, and is the most intuitive understanding of remote sensing image. Scene classification only focuses on the semantic features of the whole image, and the overall cognition of the image scene. Scene classification pays attention to the global macro information, and generally tends to classify a region as a whole according to the scene semantic information. Global cognition and semantic information are the two most important parts of scene classification. High-resolution remote sensing image scene classification is widely used, such as urban functional zoning planning (Huang, 2018), vehicle (Schilling, 2018) and ship object detection (Wang, 2019.), etc
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have