This paper proposes a hybrid model to improve Information Content (IC) related metrics of semantic similarity between words, named IC+SP, based on the essential hypothesis that IC and the shortest path are two relatively independent semantic evidences and have approximately equal influences to the semantic similarity metric. The paradigm of IC+SP is to linearly combine the IC-related metric and the shortest path. Meanwhile, a transformation from the semantic similarity of the concepts to that of the words is presented by maximizing every component of IC+SP. 13 improved IC-related metrics based on IC+SP are formed and implemented on the experimental platform HESML Lastra-Díaz (Inf Syst 66:97–118, 2017). Pearson’s and Spearman’s correlation coefficients on well-accepted benchmarks for the improved metrics compare to those for the original ones to evaluate IC+SP. I introduce the Wilcoxon Signed-Rank Test needing no standard distribution hypothesis, while, this hypothesis is required by T-Test on the sample of small size. T-Test, as well as the Wilcoxon Signed-Rank Test, conduct on the differences of the correlative coefficients for improved and original metrics. It is expected that the improved IC-related metrics could significantly outperform their corresponding original ones, and the experimental results, including the comparisons of mean and maximum of correlation coefficients as well as the p-value and confidence interval of both tests, accomplish the anticipation in the vast majority of cases.
Read full abstract