Abstract

The CBOW (Continuous Bag-of-Words) model is a three-layer forward neural network that predicts the central word vector by the fixed-size window information. In fact, context has a very important role in understanding the meaning of words. However, the context information of the fixed window size is partial and it is not enough to represent the whole context. Due to the polysemy of Chinese words, the same word may have different semantics in different contexts, while the traditional CBOW method ignores the polysemy of words. Therefore, this paper proposes a context-based word vector extension structure for the above problems. The main contents are as follows: (1) Introducing the concept of context vector. The entire sentence in which the target word is located is represented by a vector. (2) Constructing the polysemy storage method of words, adding a contextual list to each word vector, so that the multiple semantics of the words can be effectively distinguished. (3) Based on the word vector extension structure of this paper, a new character vector generation model is proposed, and the effect of the new model and the traditional CBOW model in the news headline similarity sorting task is compared. The experimental results show that the new character vector generation model based on the extension structure of the word vector obtains better result.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call