Evaluating semantic similarity and relatedness between concepts by combining taxonomic and non-taxonomic semantic features of WordNet and Wikipedia

Muhammad Jawad Hussain,Heming Bai,Shahbaz Hassan Wasti,Guangjian Huang,Yuncheng Jiang

doi:10.1016/j.ins.2023.01.007

Abstract

Many applications in cognitive science and artificial intelligence utilize semantic similarity and relatedness to solve difficult tasks such as information retrieval, word sense disambiguation, and text classification. Previously, several approaches for evaluating concept similarity and relatedness based on WordNet or Wikipedia have been proposed. WordNet-based methods rely on highly precise knowledge but have limited lexical coverage. In contrast, Wikipedia-based models achieve more coverage but sacrifice knowledge quality. Therefore, in this paper, we focus on developing a comprehensive semantic similarity and relatedness method based on WordNet and Wikipedia. To improve the accuracy of existing measures, we combine various taxonomic and non-taxonomic features of WordNet, including gloss, lemmas, examples, sister-terms, derivations, holonyms/meronyms, and hypernyms/hyponyms, with Wikipedia gloss and hyperlinks, to describe concepts. We present a novel technique for extracting ‘is-a’ and ‘part-whole’ relationships between concepts using the Wikipedia link structure. The suggested technique identifies taxonomic and non-taxonomic relationships between concepts and offers dense vector representations of concepts. To fully exploit WordNet and Wikipedia’s semantic attributes, the proposed method integrates their semantic knowledge at feature-level, combining semantic similarity and relatedness into a single comprehensive measure. The experimental results demonstrate the effectiveness of the proposed method over state-of-the-art measures on various gold standard benchmarks.

Full Text