A Hybrid Semantic Representation with Internal and External Knowledge for Word Similarity

Yanyan Wang,Kaili Wang,Jianbo Liu,Fulian Yin

doi:10.1109/icaibd49809.2020.9137463

Abstract

Word similarity (WS) plays an important role in natural language processing. Existing approaches to WS are mainly based on word embedding, which is obtained by massive and high-quality corpus, and they neglect insufficient corpus about some specific fields, and do not consider the prior knowledge which can provide useful semantic information to calculate the similarity of word pairs. In this paper, we propose a hybrid word representation method and combine multiple prior knowledge with context semantic information to address WS task. First, the core of our method is the construction of a related word set including word concept, character concept and word synonyms for each word, which extracted from existing knowledge bases, to enrich the semantic knowledge under small corpus. Then, we encode the related word set based on pre-trained word embedding model and aggregate these vectors into a related vector with semantic weights to obtain the prior knowledge of related word sets. Finally, we incorporate related vector into context vector of the word to train a specific WS task. Compared with baseline models, the experiments on similarity evaluation datasets validate the effectiveness of our hybrid model in WS task.

Full Text