Sentence similarity using weighted path and similarity matrices

Reza Javadzadeh,Marzea Rahimi,Morteza Zahedi

doi:10.3906/elk-1901-91

Abstract

Sentence similarity is the task of assessing how similar the two snippets of text are. Similarity techniques are used extensively in clustering, summarization, classification, plagiarism detection etc. Due to a small set of vocabularies, sentence similarity is considered to be a difficult problem in natural language processing. There are two issues in solving this problem: (1) Which similarity techniques to be used for word pair similarity and (2) How to generalize that to sentence pairs. We have used the weighted path, a WordNet-based similarity assessment, and the paraphrase database to obtain word pair similarity values. Thereafter, we extracted maximum values from the pairwise similarity matrix and computed a similarity value for a sentence pair. We have also incorporated a vector space model technique to form a robust similarity measure. Our method outperformed state-of-the-art methods on the STSS65 test dataset in Pearson's correlation of 87 % compared to human similarity scores. Moreover, our approach performed on par with other methods on the STSS131 test data using the same test. Our approach outperforms all the other WordNet-based methods compared on both datasets.

Highlights

Similar sentences may discuss the same idea, or they may be on a similar topic
Results on the STSS131 dataset demonstrates that our work, Latent semantic analysis (LSA), and Semantic text similarity (STS) are on par with human similarity scores, which means that these three approaches have the least average difference from human scores
This paper has argued that the recent approaches have not been thorough in feature extraction from similarity matrices and the importance of information content value was neglected in most of the studies

Summary

Introduction

Similar sentences may discuss the same idea, or they may be on a similar topic. Similar sentence pairs usually contain common words, link to common concepts, and have many cooccurring words. Latent semantic analysis (LSA) is a popular approach which is used extensively for NLP tasks [7] This method is based on statistics and uses the frequency values of words in both sentences to compute similarity. Semantic text similarity (STS) [9] proposes to combine three metrics to compute similarity: (1) string matching in which the number of common characters between word pairs is computed, (2) the SOCPMI approach, and (3) word order information. Two methods, namely sentence vector similarity and similarity matrix values, were combined to form a robust measure This has led to a better correlation compared to their individual results.

The proposed approach

Paraphrase database

Strongly agree

Experimental results

R2 Statistics

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sentence similarity using weighted path and similarity matrices

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES

Lead the way for us

Journal: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES	Publication Date: Sep 18, 2019
License type: cc-by

Similar Papers

Sublinear Time Approximation of Text Similarity Matrices
Archan Ray ... Andrew Mccallum
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36
Archan Ray, et. al.Archan Ray ... Andrew Mccallum
28 Jun 2022
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36

The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.
Yanshan Wang ... Sam Henry
JMIR medical informatics | VOL. 8
Yanshan Wang, et. al.Yanshan Wang ... Sam Henry
27 Nov 2020
The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.
Yanshan Wang ... Sam Henry

Measuring Semantic Similarity of Vietnamese Sentences Based on Lexical and Distribution Similarity
Van-Tan Bui ... Phuong-Thai Nguyen
-
Van-Tan Bui, et. al.Van-Tan Bui ... Phuong-Thai Nguyen
08 Dec 2021
08 Dec 2021

Sentence similarity measuring by vector space model
U. L. D. N. Gunasinghe ... W. A. D. Sashika
-
U. L. D. N. Gunasinghe, et. al.U. L. D. N. Gunasinghe ... W. A. D. Sashika
01 Dec 2014
01 Dec 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sentence similarity using weighted path and similarity matrices

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: TURKISH JOURNAL OF ELECTRICAL ENGINEERING &amp; COMPUTER SCIENCES

More From: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES