Jointly Learning of Visual and Auditory: A New Approach for RS Image and Audio Cross-Modal Retrieval

Mao Guo,Chenghu Zhou,Jiahang Liu

doi:10.1109/jstars.2019.2949220

Abstract

Remote sensing (RS) images are widely used in civilian and military fields. With the highly increasing image data, it has become a challenging issue to achieve fast and efficient RS image retrieval. However, the existing image retrieval methods, text-based or content-based, are still limited in the applications; for example, text input is inefficient, and the sample image for query is often unavailable. It is known that speech is a natural and convenient way of communication. Therefore, a novel speech-image cross-modal retrieval approach, named deep visual-audio network (DVAN), is presented in this article, which can establish the direct relationship between image and speech from paired image-audio data. The model mainly has three parts: 1) Image feature extraction, which is used to extract effective features of RS images; 2) audio feature learning, which is used to recognizing key information from raw data, and AudioNet, as part of DVAN, is proposed to obtain more distinguishing features; 3) multimodal embedding, which is used to learn the direct correlations of two modalities. Experimental results on RS image audio dataset demonstrate that the proposed method is effective and speech-image retrieval is feasible, and it provides a new way for faster and more convenient RS image retrieval.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Jointly Learning of Visual and Auditory: A New Approach for RS Image and Audio Cross-Modal Retrieval

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

Lead the way for us

Journal: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing	Publication Date: Nov 1, 2019
Citations: 87