Abstract

Word similarity (WS) plays an important role in natural language processing. Existing approaches to WS are mainly based on word embedding, which is obtained by massive and high-quality corpus, and they neglect insufficient corpus about some specific fields, and do not consider the prior knowledge which can provide useful semantic information to calculate the similarity of word pairs. In this paper, we propose a hybrid word representation method and combine multiple prior knowledge with context semantic information to address WS task. First, the core of our method is the construction of a related word set including word concept, character concept and word synonyms for each word, which extracted from existing knowledge bases, to enrich the semantic knowledge under small corpus. Then, we encode the related word set based on pre-trained word embedding model and aggregate these vectors into a related vector with semantic weights to obtain the prior knowledge of related word sets. Finally, we incorporate related vector into context vector of the word to train a specific WS task. Compared with baseline models, the experiments on similarity evaluation datasets validate the effectiveness of our hybrid model in WS task.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.