Abstract

We present a work to evaluate the hypothesis that automatic evaluation metrics developed for Machine Translation MT systems have significant impact on predicting semantic similarity scores in Semantic Textual Similarity STS task, in light of their usage for paraphrase identification. We show that different metrics may have different behaviors and significance along the semantic scale [0---5] of the STS task. In addition, we compare several classification algorithms using a combination of different MT metrics to build an STS system; consequently, we show that although this approach obtains remarkable result in paraphrase identification task, it is insufficient to achieve the same result in STS. We show that this problem is due to an excessive adaptation of some algorithms to dataset domain and at the end a way to mitigate or avoid this issue.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call