Abstract
Semi-supervised cross-modal retrieval is an eclectic paradigm which learns common representations via exploiting underlying semantic information from both labeled and unlabeled data. Most existing methods ignore the rich semantic information of text data and are unable to fully utilize the text data in common representation learning. Moreover, they only considered the correlation of the data with the same semantic label, but ignored the correlation between the data with different semantic label. In this paper, we propose a novel semi-supervised cross-modal retrieval method, called Graph-based Semantic Alignment Network (GSAN), which learns common representation by aligning the features of different modalities with semantic embeddings of text data. Firstly, we design a Deep Supervised Semantic Encoding (DSSE) module to train the semantic projector and label predictor which can exploit the semantic embeddings and the predicted labels from unlabeled data of text modality. Then, GAN-based Bidirectional Fusion (GBF) module is designed to learn the mapping networks of two modalities (image and text). In order to make the mapping networks generate semantically discriminative and modality-invariant representations, we utilize the underlying semantic information exploited by DSSE to construct Graph-based Triplet Constraint (GTC) which can enforce feature embeddings from the semantically-matched (image and text) pairs to be more similar and push those mismatched ones away. By the benefit of fully using of semantic information, our approach can only use fewer label data and achieves the performance of state-of-the-art methods. In addition, since we only utilize the mapping networks trained in GBF module to generate common representations in referring stage, our approach is efficient and time saving in real world application. Extensive experiments on four widely-used datasets show the effectiveness of GSAN.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.