Improving semantic similarity retrieval with word embeddings

Fengqi Yan,Mingming Lu,Qiaoqing Fan

doi:10.1002/cpe.4489

Abstract

SummaryWord similarity matchmaking is one of the core research areas of information retrieval. The existing methods based on a synonym dictionary would lead to the problem of semantic gap, which could be caused by the absence of synonyms. To address this problem, we improve semantic similarity retrieval by incorporating word embeddings. Especially, word embeddings are trained by Word2Vec and then use them to depict the semantic similarity between words. Experiments are conducted on two different datasets, ie, one is a public long text dataset (ie, Reuters‐21578), and the other is a short text dataset (ie, 120ask) collected from a healthcare community. The experimental results on the two datasets show that the proposed method further improves the accuracy of the similarity retrieval.

Full Text