Abstract

Word embedding (i.e., word representation) transforms words into computable mathematical expressions (usually vectors) according to semantics. Compared with human semantic representation, these purely text-based models are severely deficient because they lack perceptual information attached to the physical world. This observation promotes the development of multimodal word representation models. Multimodal models have been proven to outperform text-based models on learning semantic word representations, and almost all previous multimodal models only focus on introducing perceptual information. However, it is obvious that syntactic information can effectively improve the performance of multimodal models on downstream tasks. Therefore, this article proposes an effective multimodal word representation model that uses two gate mechanisms to explicitly embed syntactic and phonetic information into multimodal representations and uses supervised learning to train the model. We select Chinese and English as examples and evaluate the model using several downstream tasks. The results show that our approach outperforms the existing models. We have made the source code of the model available to encourage reproducible research.

Highlights

  • Word embedding is often used in natural language processing (NLP) tasks such as machine translation [59], text classification [1], and dialogue systems [50]

  • Compared to human semantic representation, these purely text-based models are severely deficient because they lack perceptual information attached to the physical world

  • Combining the results of other intrinsic evaluation tasks, it can be concluded that the word representation generated by the MSP model contain more semantic and syntactic information, and that such information can be used in relevant downstream tasks

Read more

Summary

INTRODUCTION

Word embedding is often used in natural language processing (NLP) tasks such as machine translation [59], text classification [1], and dialogue systems [50]. It is more difficult to obtain syntactic information through the distributional hypothesis These factors inspire us to build a multimodal word representation model that can embed syntactic and perceptual information effectively, and the model is called MSP. Compared with the existing word embedding models, MSP explicitly embeds syntactic and phonetic information in the model, simulates multimodal information fusion through two gate mechanisms, and obtains a multimodal word representation model with excellent performance through supervised training. On various NLP tasks, we use multiple word representation models and pre-trained language models as baselines to compare the performance and set MSP- with no processing of syntactic information as a control.

RELATED WORKS
PROPOSED METHOD
TASK EVALUATION
CONCEPT CATEGORIZATION TASK
2) RESULTS AND DISCUSSION
WORD SIMILARITY TASK
WORD ANALOGY TASK
PART-OF-SPEECH TAGGING TASK
TEXT CLASSIFICATION TASK
TEXT SIMILARITY TASK
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call