Abstract

Word embeddings have a significant impact on natural language processing. In morpheme writing systems, most Chinese word embeddings take a word as the basic unit, or directly use the internal structure of words. However, these models still neglect the rich relevant derivative meanings in the internal structure of Chinese characters. Based on our observations, the relevant derivative meanings of the main-components in Chinese characters are very helpful for improving Chinese word embeddings learning. In this paper, we focus on employing the relevant derivative meanings of the main-components in the Chinese characters to train and enhance the Chinese word embeddings. To this end, we propose two main-component enhanced word embedding models named MCWE-SA and MCWE-HA respectively, which incorporate the relevant derivative meanings of the main-components during the training process based on the attention mechanism. Our models can fine-grained enhance the precision of word embeddings without generating additional vectors. Experiments on word similarity and syntactic analogy tasks are conducted to validate the feasibility of our models. Furthermore, the results show that our models have a certain improvement in the similarity task over most baselines, and have nearly 3% improvement in Chinese analogical reasoning dataset compared with the state-of-the-art model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.