Abstract

The word embedding model embeds each word into a low-dimensional space by using the distribution information of unlabeled words in the corpus, such that the generalization ability of lexical features can be improved. However, the performance of word embedding models is limited by Out of Vocabulary (OOV) words because the relevant information of OOV words can’t be fully used to generate accurate word embedding. To effectively process OOV words, morphological structure information and context information should be considered. In view of the characteristics of Chinese, we propose a Fusion Multi-feature Encoder Based on Attention (FMEBA) for processing Chinese OOV words, in which we use the radical feature of characters, as well as use character-level Transformer Encoder to process character sequence information and context information. To test our model, we conducted experiments on a Chinese power plan professional dataset. Experimental results on the dataset shows compared with other models, our model achieved the best results. We conclude that our method is suitable for processing Chinese OOV words.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.