Cross-lingual text alignment for fine-grained plagiarism detection

Nava Ehsan,Azadeh Shakery,Frank Wm Tompa

doi:10.1177/0165551518787696

Abstract

Fast and easy access to a wide range of documents in various languages, in conjunction with the wide availability of translation and editing tools, has led to the need to develop effective tools for detecting cross-lingual plagiarism. Given a suspicious document, cross-lingual plagiarism detection comprises two main subtasks: retrieving documents that are candidate sources for that document and analysing those candidates one by one to determine their similarity to the suspicious document. In this article, we examine the second subtask, also called the detailed analysis subtask, where the goal is to align plagiarised fragments from source and suspicious documents in different languages. Our proposed approach has two main steps: the first step tries to find candidate plagiarised fragments and focuses on high recall, followed by a more precise similarity analysis based on dynamic text alignment that will filter the results by finding alignments between the identified fragments. With these two steps, the proximity of the terms will be considered in different levels of granularity. In both steps, our approach uses a dictionary to obtain translations of individual terms instead of using a machine translation system to convert longer passages from one language to another. We used a weighting scheme to distinct multiple translations of the terms. Experimental results show that our method outperforms the methods used by the systems that achieved the best results in the PAN-2012 and PAN-2014 competitions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cross-lingual text alignment for fine-grained plagiarism detection

Abstract

Talk to us

Similar Papers

More From: Journal of Information Science

Lead the way for us

Journal: Journal of Information Science	Publication Date: Aug 13, 2018
Citations: 6

Similar Papers

Using a Dictionary and n-gram Alignment to Improve Fine-grained Cross-Language Plagiarism Detection
Nava Ehsan ... Azadeh Shakery
-
Nava Ehsan, et. al.Nava Ehsan ... Azadeh Shakery
13 Sep 2016
13 Sep 2016

Applications and use Cases of Multilevel Granularity for Network Traffic Classification
Faiz Zaki ... Nor Badrul Anuar
-
Faiz Zaki, et. al.Faiz Zaki ... Nor Badrul Anuar
01 Feb 2020
01 Feb 2020

Multiresolution texture analysis for human oocyte cytoplasm description
Laura Caponetti ... Gianluca Sforza
-
Laura Caponetti, et. al.Laura Caponetti ... Gianluca Sforza
01 May 2009
01 May 2009

Sometimes “Tomorrow” is “Sometime”
José Luiz Fiadeiro ... Tom Maibaum
-
José Luiz Fiadeiro, et. al.José Luiz Fiadeiro ... Tom Maibaum
01 Jan 1993
01 Jan 1993

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross-lingual text alignment for fine-grained plagiarism detection

Abstract

Talk to us

Similar Papers

More From: Journal of Information Science