Abstract

People have witnessed the swift development of multimedia devices and multimedia technologies in recent years. How to catch interesting and highly relevant information from the magnanimous multimedia data becomes an urgent and challenging matter. To obtain more accurate retrieval results, researchers naturally think of using more fine-grained features to evaluate the similarity among multimedia samples. In this paper, we propose a Deep Attentional Fine-grained Similarity Network (DAFSN) for cross-modal retrieval, which is optimized in an adversarial learning manner. The DAFSN model consists of two subnetworks, attentional fine-grained similarity network for aligned representation learning and modal discriminative network. The front subnetwork adopts Bi-directional Long Short-Term Memory (LSTM) and pre-trained Inception-v3 model to extract text features and image features. In aligned representation learning, we consider not only the sentence-level pair-matching constraint but also the fine-grained similarity between word-level features of text description and sub-regional features of an image. The modal discriminative network aims to minimize the “heterogeneity gap” between text features and image features in an adversarial manner. We do experiments on several widely used datasets to verify the performance of the proposed DAFSN. The experimental results show that the DAFSN obtains better retrieval results based on the MAP metric. Besides, the result analyses and visual comparisons are presented in the experimental section.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.