Modeling Paraphrase Identification Using Supervised Learning Methods Against Various Datasets and Features

Rutal S. Mahajan,Mukesh A. Zaveri

doi:10.1109/iccic.2017.8524379

Abstract

Paraphrase identification is the task of identifying the meaning similarity between two text segments given in natural language. It is the primary task essential for natural language understanding. Past work in paraphrase identification primarily focused on machine learning based approaches which are evaluated on any single type of dataset. In this work, paraphrase identification is modeled as the task of binary classification using different classifiers in a supervised manner. Performance of proposed supervised paraphrase identification models are evaluated against two different datasets namely, Twitter paraphrase corpus and Microsoft Research Paraphrase corpus. Evaluation is carried out by means of standard evaluation measures on different experimental setup with lexical, syntactic and semantic features. The proposed paraphrase identification approach achieves competitive results compare to other state-of-the-art machine learning approaches.

Full Text