Measuring text similarity based on structure and word embedding

Mamdouh Farouk

doi:10.1016/j.cogsys.2020.04.002

Abstract

The problem of finding the similarity between natural language sentences is crucial for many applications in Natural Language Processing (NLP). An accurate calculation of similarity between sentences is highly needed. Many approaches depend on word-to-word similarity to measure sentence similarity. This paper proposes a new approach to improve the accuracy of the sentence similarity calculation. The proposed approach combines different similarity measures in the calculation of sentence similarity. In addition to traditional word-to-word similarity measure, the proposed approach exploits sentence semantic structure. Discourse representation structure (DRS) which is a semantic representation for natural sentences is generated and used to calculated structure similarity. Furthermore, word order similarity is measured to consider the order of words in sentences. Experiments show that exploiting structural information achieves good results. Moreover, the proposed method outperforms the current approaches on a standard benchmark dataset achieving 0.8813 Pearson correlation with human similarity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Measuring text similarity based on structure and word embedding

Abstract

Talk to us

Similar Papers

More From: Cognitive Systems Research

Lead the way for us

Journal: Cognitive Systems Research	Publication Date: May 6, 2020
Citations: 34

Similar Papers

Measuring Sentences Similarity: A Survey
Mamdouh Farouk
Indian Journal of Science and Technology | VOL. 12
Mamdouh FaroukMamdouh Farouk
01 Jul 2019
Indian Journal of Science and Technology | VOL. 12

Sentences similarity analysis based on word embedding and syntax analysis
Xinchen Xu ... Feiyue Ye
-
Xinchen Xu, et. al.Xinchen Xu ... Feiyue Ye
01 Oct 2017
01 Oct 2017

Measuring Sentences Similarity Based on Discourse Representation Structure
Mamdouh Farouk
Computing and Informatics | VOL. 39
Mamdouh FaroukMamdouh Farouk
01 Jan 2020
Computing and Informatics | VOL. 39

A Multi-feature Fusion Method for Tibetan Sentence Similarity Calculation
Xilin Chen ... Cairang Zhuoma
-
Xilin Chen, et. al.Xilin Chen ... Cairang Zhuoma
01 Mar 2021
01 Mar 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Measuring text similarity based on structure and word embedding

Abstract

Talk to us

Similar Papers

More From: Cognitive Systems Research