A hybrid approach for paraphrase identification based on knowledge-enriched semantic heuristics

Muhidin Mohamed,Mourad Oussalah

doi:10.1007/s10579-019-09466-4

Muhidin Mohamed, Mourad Oussalah

Open Access

https://doi.org/10.1007/s10579-019-09466-4

Copy DOI

Abstract

In this paper, we propose a hybrid approach for sentence paraphrase identification. The proposal addresses the problem of evaluating sentence-to-sentence semantic similarity when the sentences contain a set of named-entities. The essence of the proposal is to distinguish the computation of the semantic similarity of named-entity tokens from the rest of the sentence text. More specifically, this is based on the integration of word semantic similarity derived from WordNet taxonomic relations, and named-entity semantic relatedness inferred from Wikipedia entity co-occurrences and underpinned by Normalized Google Distance. In addition, the WordNet similarity measure is enriched with word part-of-speech (PoS) conversion aided with a Categorial Variation database (CatVar), which enhances the lexico-semantics of words. We validated our hybrid approach using two different datasets; Microsoft Research Paraphrase Corpus (MSRPC) and TREC-9 Question Variants. In our empirical evaluation, we showed that our system outperforms baselines and most of the related state-of-the-art systems for paraphrase detection. We also conducted a misidentification analysis to disclose the primary sources of our system errors.

Highlights

Paraphrases are sentences conveying the same meaning using alternative language expressions (Dias et al 2010)
In this paper, we propose a hybrid approach for sentence paraphrase identification
This is based on the integration of word semantic similarity derived from WordNet taxonomic relations, and named-entity semantic relatedness inferred from Wikipedia entity co-occurrences and underpinned by Normalized Google Distance

Summary

Introduction

Paraphrases are sentences conveying the same meaning using alternative language expressions (Dias et al 2010). The identification of paraphrases is explicitly related to the quantification of the amount of semantic overlap between two textual expressions. Paraphrase Identification (PI) is a useful task for many other important NLP applications including Text Summarization, Plagiarism Detection, Intelligent Tutoring Systems, Question Answering, and Machine Translation. Paraphrases can be used to substantiate the correctness of answers produced by a question answering application. Plagiarism detection is another task that can benefit from PI by identifying texts that have been restated using alternative language. In the case of Intelligent Tutoring systems, one can assess whether students’ submissions/answers are semantically equivalent to reference answers exploiting paraphrase identification

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Language Resources and Evaluation	Publication Date: Apr 16, 2019
Citations: 28	License type: open-access

R Discovery Prime

R Discovery Prime

A hybrid approach for paraphrase identification based on knowledge-enriched semantic heuristics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Language Resources and Evaluation

Lead the way for us

Similar Papers

Semantic and Heuristic Based Approach for Paraphrase Identification
Muhidin A Mohamed ... Mourad Oussalah
-
Muhidin A Mohamed, et. al.Muhidin A Mohamed ... Mourad Oussalah
01 Sep 2018
01 Sep 2018

Constructing a Turkish Corpus for Paraphrase Identification and Semantic Similarity
Asli Eyecioglu ... Bill Keller
-
Asli Eyecioglu, et. al.Asli Eyecioglu ... Bill Keller
01 Jan 2018
01 Jan 2018

Modeling Paraphrase Identification Using Supervised Learning Methods Against Various Datasets and Features
Rutal S Mahajan ... Mukesh A Zaveri
-
Rutal S Mahajan, et. al.Rutal S Mahajan ... Mukesh A Zaveri
01 Dec 2017
01 Dec 2017

Semantic Textual Similarity and Factorization Machine Model for Retrieval of Question-Answering
Nivid Limbasiya ... Prateek Agrawal
-
Nivid Limbasiya, et. al.Nivid Limbasiya ... Prateek Agrawal
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A hybrid approach for paraphrase identification based on knowledge-enriched semantic heuristics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Language Resources and Evaluation