Abstract

User-provided annotations in existing multimodal datasets sometimes are inappropriate for model learning and can hinder the task of cross-modal retrieval. To handle this issue, we propose a discriminative and noise-robust cross-modal retrieval method, called FLPCL, which consists of deep feature learning and partial correlation learning. Deep feature learning is implemented by utilizing label supervised information to guide the training of deep neural network for each modality, which aims to find modality-specific deep feature representations that preserve the similarity and discrimination information among multimodal data. Based on deep feature learning, partial correlation learning is proposed to infer direct association between different modalities by removing the effect of common underlying semantics from each modality. It is achieved by maximizing the canonical correlation of the feature representations of different modalities conditioned on the label modality. Different from existing works that build indirect association between modalities via incorporating semantic labels, our FLPCL method can learn more effective and robust multimodal latent representations by explicitly preserving both intra-modal and inter-modal relationship among multimodal data. Extensive experiments on three cross-modal datasets show that our method outperforms state-of-the-art methods on cross-modal retrieval tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.