Abstract

Cross-modal retrieval aims to search the semantically similar instances from the other modalities by giving a query from one modality. Recently, generative adversarial networks (GANs) has been proposed to model the joint distribution over the data from different modalities and to learn the common representations for cross-modal retrieval. However, most of existing GANs-based methods simply project original representations of different modalities into a common representation space, and ignore the fact that different modalities share the common characteristics and on the other side each modality has the individual characteristics. To address this problem, in this paper, we propose a novel cross-modal retrieval method, called representation separation adversarial networks, which explicitly separates the original representations into common latent representations and private representations. Specifically, we minimize the correlation between the common representations and private representations to ensure independence of them. Then, we reconstruct the original representations via exchanging the common representations of different modalities to encourage the information swap. Finally, the labels are utilized to increase the discriminant of common representations. Comprehensive experimental results on two widely used datasets show that the proposed method achieved better performance than many existing GANs-based methods, and demonstrate that explicitly modeling the private representation for each modality can improve the model to extract common latent representations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.