Abstract
Despite the ever-increasing interest in the field of text similarity methods, the development of adequate text similarity methods is lagging. Some methods are decent in entailment while others are reasonable to the degree to which two texts are similar. Very often, these methods are compared using Pearson’s correlation; however, Pearson’s correlation is bound to outliers that could affect the final correlation coefficient figure. As a result, the Pearson correlation is inadequate to find which text similarity method is better in situations where data items are very similar or are unrelated. This paper borrows the scaled Pearson correlation from the finance domain and builds a metric that can evaluate the performance of similarity methods over cross-sectional datasets. Results showed that the new metric is fine-grained with the benchmark dataset scores range as a promising alternative to Pearson’s correlation. Moreover, extrinsic results from the application of the System Usability Scale (SUS) questionnaire on the scaled Pearson correlation revealed that the proposed metric is attaining attention from scholars which implicate its usage in the academia.
Highlights
Semantic Textual Similarity (STS) determines the degree of which two texts are similar
The Pearson correlation finds the degree of association between the STS system and human scores, which is a value in the range of -1 to a +1
This paper proposes a new similarity performance evaluation metric, scaled Pearson correlation, which was borrowed from the finance domain
Summary
Semantic Textual Similarity (STS) determines the degree of which two texts are similar. It is active research, part of which in the SemEval workshop series The relationship between human rating scores and STS system scores is used as the foundation for STS system assortment, often using Pearson Correlation (e.g., Šarić, Glavaš, Karan, Šnajder, & Bašić, 2012). The Pearson correlation finds the degree of association between the STS system and human scores, which is a value in the range of -1 to a +1. When the magnitude of the value is close to 1, it implies a high correlation with the human rating; the similarity method becomes promising
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have