Abstract

AbstractCross‐modal retrieval aims to do flexible retrieval among different modals, whose main issue is how to measure the semantic similarities among multimodal data. Though many existing methods have been proposed to enable cross‐modal retrieval, they rarely consider the preservation of content information among multimodal data. In this paper, we present a three‐stage cross‐modal retrieval method, named MMCA‐CMR. To reduce the discrepancy among multimodal data, we first attempt to embed multimodal data into a common representation space. We then combine the feature vectors with the content information into the semantic‐aware feature vectors. We finally obtain the feature‐aware and content‐aware projections via multimodal semantic autoencoders. With semantic deep autoencoders, MMCA‐CMR promotes a more reliable cross‐modal retrieval by learning feature vectors from different modalities and content information simultaneously. Extensive experiments demonstrate that the proposed method is valid in cross‐modal retrieval, which significantly outperforms state‐of‐the‐art on four widely‐used benchmark datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.