Detecting paraphrases in the Marathi language

Sharvari Govilkar,Shruti Srivastava

doi:10.54646/bijscit.2020.03

Abstract

Paraphrasing refers to writing that either differs in its textual content or is dissimilar in rearrangement of wordsbut conveys the same meaning. Identifying a paraphrase is exceptionally important in various real life applicationssuch as Information Retrieval, Plagiarism Detection, Text Summarization and Question Answering. A large amountof work in Paraphrase Detection has been done in English and many Indian Languages. However, there is noexisting system to identify paraphrases in Marathi. This is the first such endeavor in the Marathi Language.A paraphrase has differently structured sentences, and since Marathi is a semantically strong language, thissystem is designed for checking both statistical and semantic similarities of Marathi sentences. Statistical similaritymeasure does not need any prior knowledge as it is only based on the factual data of sentences. The factual datais calculated on the basis of the degree of closeness between the word-set, word-order, word-vector and word-distance. Universal Networking Language (UNL) speaks about the semantic significance in sentence without anysyntactic points of interest. Hence, the semantic similarity calculated on the basis of generated UNL graphs for twoMarathi sentences renders semantic equality of two Marathi sentences. The total paraphrase score was calculatedafter joining statistical and semantic similarity scores, which gives a judgment on whether there is paraphrase ornon-paraphrase about the Marathi sentences in question.

Full Text