Enhancing Chinese Word Embeddings from Relevant Derivative Meanings of Main-Components in Characters

Xinyu Su,Wei Yang,Junyi Wang

doi:10.1007/978-3-030-32381-3_3

Abstract

Word embeddings have a significant impact on natural language processing. In morpheme writing systems, most Chinese word embeddings take a word as the basic unit, or directly use the internal structure of words. However, these models still neglect the rich relevant derivative meanings in the internal structure of Chinese characters. Based on our observations, the relevant derivative meanings of the main-components in Chinese characters are very helpful for improving Chinese word embeddings learning. In this paper, we focus on employing the relevant derivative meanings of the main-components in the Chinese characters to train and enhance the Chinese word embeddings. To this end, we propose two main-component enhanced word embedding models named MCWE-SA and MCWE-HA respectively, which incorporate the relevant derivative meanings of the main-components during the training process based on the attention mechanism. Our models can fine-grained enhance the precision of word embeddings without generating additional vectors. Experiments on word similarity and syntactic analogy tasks are conducted to validate the feasibility of our models. Furthermore, the results show that our models have a certain improvement in the similarity task over most baselines, and have nearly 3% improvement in Chinese analogical reasoning dataset compared with the state-of-the-art model.

Full Text