Abstract

With the emergence of mass multimedia data on Internet, there is an increasing demand for cross-media retrieval applications. The modal feature representation capability is crucial for cross-media retrieval. This paper proposes an improved feature correlation model of image and text modalities based on adversarial network, called FCMAN, which uses the adversarial network to establish feature correlation between image and text modalities. Firstly, different features of the image modality are fused to enhance the feature representation capability. Secondly, based on the feature correlation modeling of a single adversarial network, two new adversarial networks are introduced to model the real labels and predictive labels of image and text modalities respectively. Experimental results show that the proposed approach can establish feature correlation between image and text modalities effectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call