Abstract

Multimodal models have been proven to outperform text-based models on learning semantic word representations. According to psycholinguistic theory, there is a graphical relationship among the modalities of language, and in recent years, the graph convolution network (GCN) has been proven to have substantial advantages in the extraction of non-European spatial features. This inspires us to propose a new multimodal word representation model, namely, GCNW, which uses the graph convolutional network to incorporate the phonetic and syntactic information into the word representation. We use a greedy strategy to update the modality-relation matrix in the GCN, and we train the model through unsupervised learning. We evaluated the proposed model on multiple downstream NLP tasks, and various experimental results demonstrate that the GCNW outperforms strong unimodal baselines and state-of-the-art multimodal models. We make the source code of both models available to encourage reproducible research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.