Abstract
Semantic textual analysis is a natural language processing task that has enjoyed several research contributions towards solving diverse real-life problems. Vector comparison is a core subtask in semantic textual similarity analysis. A plethora of solutions including recent state-of-the-art transformer-based pre-trained language models for transfer learning have focused on using only cosine similarity for embedding evaluation in downstream tasks and ignored other vector comparison methods. To investigate the relative performance of some such ignored measures, this work proposes novel adaptations for soft cosine and extended cosine vector measures. We investigate their performance against the conventional cosine measure, distance-weighted cosine, vector similarity measure, negative Manhattan, and Euclidean distances on downstream semantic textual similarity tasks, under same conditions, for the first time in literature. Adopting transformer-based Universal sentence encoder, SBERT, SRoBERTa, SimCSE, and ST5 for text encoding; the performances of the adapted measures are evaluated on diverse real world datasets using Pearson, Spearman, accuracy and F1 evaluation metrics. Results obtained show that the adapted measures significantly surpass previously reported state-of-the-art cosine similarity-based correlations in several test cases considered.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have