Abstract

The performance of remote sensing image retrieval (RSIR) systems depends on the capability of the extracted features in characterizing the semantic content of images. Existing RSIR systems describe images by visual descriptors that model the primitives (such as different land-cover classes) present in the images. However, the visual descriptors may not be sufficient to describe the high-level complex content of RS images (e.g., attributes and relationships among different land-cover classes). To address this issue, in this article, we present an RSIR system that aims at generating and exploiting textual descriptions to accurately describe the relationships between the objects and their attributes present in RS images with captions (i.e., sentences). To this end, the proposed retrieval system consists of three main steps. The first step aims to encode the image visual features and then translate the encoded features into a textual description that summarizes the content of the image with captions. This is achieved based on the combination of a convolutional neural network with a recurrent neural network. The second step aims to convert the generated textual descriptions into semantically meaningful feature vectors. This is achieved by using the recent word embedding techniques. Finally, the last step estimates the similarity between the vectors of the textual descriptions of the query image and those of the archive images, and then retrieve the most similar images to the query image. Experimental results obtained on two different datasets show that the description of the image content with captions in the framework of RSIR leads to an accurate retrieval performance.

Highlights

  • R ECENT advances in satellite technology result in an explosive growth of remote sensing (RS) image archives

  • In the RS community, a great attention is devoted to content-based image retrieval that aims to search and retrieve the most similar images to a query image based on two main steps: Manuscript received January 23, 2020; revised April 19, 2020, June 18, 2020, and July 22, 2020; accepted July 27, 2020

  • The proposed methodology consists of three main steps, which are as follows: 1) image caption generation; 2) sentence encoding; and 3) image retrieval based on the encoded sentences of images

Read more

Summary

Introduction

R ECENT advances in satellite technology result in an explosive growth of remote sensing (RS) image archives. One of the important research topics is the development of accurate RS image retrieval (RSIR) systems to retrieve the most relevant images to a query image from such massive archives. The traditional content-based RSIR systems rely on hand-crafted features to describe the semantic content of images To this end, several visual descriptors are presented in RS. Unsupervised methods compute the similarity between the visual features of the query image and those of the archive images and retrieve the most similar images to the query. To this end, one can use the k-nearest neighbor algorithm. In [9], a sparse reconstruction-based method that generalizes the standard sparse classifier to the case of multilabel RS image retrieval problems is introduced

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.