Abstract

SummaryTraditional feature‐based semantic similarity (SS) approaches exploit the Wikipedia features in term of sets. They evaluate the similarity of concepts based on the commonalities among their feature sets. However, these feature‐based approaches treat all the features equally in similarity evaluation. Therefore, they ignore the underlying statistics of the features and consequently lose the essential semantic details about them. One solution is that each feature can be assigned a specific weight using its statistics. This weight will reflect the relative importance of a feature in similarity evaluation. Therefore, in this paper, based on two statistical models, ie, information content and TFIDF, we propose some hybrid semantic similarity measurement methods. Firstly, we propose some new methods called weighting functions to compute the weights of the features and feature sets in Wikipedia. Secondly, based on the weighting functions, we propose some new weighted feature‐based SS approaches for Wikipedia concepts. Thirdly, we evaluate the proposed methods on well‐known benchmarks for English, German, French, and Spanish languages. Finally, we compare the performance of our methods with the traditional feature‐based and some state‐of‐the‐art SS approaches. The experimental evaluation shows that our weighted methods perform better than the traditional feature‐based and some state‐of‐the‐art approaches in similarity evaluation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call