FSPRM: A Feature Subsequence Based Probability Representation Model for Chinese Word Embedding

Yun Zhang,Yongguo Liu,Xindong Wu,Jiajing Zhu

doi:10.1109/taslp.2021.3073868

Abstract

Chinese word embedding models capture Chinese semantics based on the character feature of Chinese words and the internal features of Chinese characters such as radical, component, stroke, structure and pinyin. However, some features are overlapping and most methods do not consider their relevance. Meanwhile, they express words as point vectors that cannot better capture different aspect semantics of Chinese words. In this paper, we propose a Feature Subsequence based Probability Representation Model (FSPRM) for learning Chinese word embeddings, in which we first integrate the morphological and phonetic features (stroke, structure and pinyin) of Chinese characters and learn their relevance by designing a feature subsequence to capture relatively comprehensive semantics of Chinese words, then feature probability distribution is proposed for capturing different aspect meanings of Chinese words based on the three internal features and probability representation by estimating its mean as the sum of feature subsequences. Chinese words with similar features may have similar semantics, then we map Chinese words to feature probability distributions and design a similarity-based objective for predicting the contextual words of the target word to learn their semantics. Extensive experiments on word analogy, word similarity, text classification and named entity recognition tasks demonstrate that the proposed method outperforms most state-of-the-art approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

FSPRM: A Feature Subsequence Based Probability Representation Model for Chinese Word Embedding

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2021
Citations: 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

FSPRM: A Feature Subsequence Based Probability Representation Model for Chinese Word Embedding

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing