Abstract

Distributed word representation has demonstrated its advantages in many natural language processing tasks. Such as named entity recognition, entity relation extraction, and text classification. Traditional one-hot word representation represents a word as a high-dimensional and sparse vector. Instead, distributed word representation represents a word as a low-dimensional and dense vector, which are more suitable as inputs of deep neural networks. Furthermore, distributed word representation can express the semantic relatedness and syntactic regularities between different words. Word embedding is a kind of distributed word representation technology, which is very popular and useful in many natural language processing tasks. Recently, more and more researches have focused on learning word embeddings with internal morphological knowledge in words, such as character, sub-words, and other kinds of morphological information. For example, Chinese characters contain rich semantic information related to words they compose. Thus, characters can help improving the representation of words. In this paper, we present a character-enhanced Chinese word embeddings model (CCWE). In the model, we train character and word embeddings simultaneously in two parallel tasks. The framework of our model is based-on Skip-Gram. We evaluate CCWE on word similarity, analogical reasoning, text classification, and named entity recognition. The results demonstrate that our model can learn both better Chinese word and character embeddings than other baseline models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call