Abstract
Generally, most existing cross-modal retrieval methods only consider global or local semantic embeddings, lacking fine-grained dependencies between objects. At the same time, it is usually ignored that the mutual transformation between modalities also facilitates the embedding of modalities. Given these problems, we propose a method called BiKA (Bidirectional Knowledge-assisted embedding and Attention-based generation). The model uses a bidirectional graph convolutional neural network to establish dependencies between objects. In addition, it employs a bidirectional attention-based generative network to achieve the mutual transformation between modalities. Specifically, the knowledge graph is used for local matching to constrain the local expression of the modalities, in which the generative network is used for mutual transformation to constrain the global expression of the modalities. In addition, we also propose a new position relation embedding network to embed position relation information between objects. The experiments on two public datasets show that the performance of our method has been dramatically improved compared to many state-of-the-art models.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.