Abstract

In this paper, we address on the task of sentence paraphrase detection which is focused on deciding whether the two sentences have the relationship of paraphrase. A supervised learning strategy for paraphrase detection is described whereby the two sentences are classified to decide the paraphrase relationship and using only the lexical features operated at n-gram as the classification features. Gradient Boosting, K-Nearest Neighbor, Decision Tree and Support vector machine are chosen as the classifiers. The performance of the classification method is compared and the features are analyzed to determine which of them are most important for paraphrase detection. Evaluation is performed on the corpus of 2016 Detecting Paraphrase in Indian Languages task proposed by Forum of Information Retrieval Evaluation (DPIL-FIRE2016). The experimental results show that the Gradient Boosting can achieve the highest Overall Score. By using the learned classifier, we got the highest F1 measure for both Task1 and Task2 on Malayalam and Tamil, and the highest F1 measure for Task2 on Punjabi in DPIL-FIRE2016.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.